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Capsule  Description 


Effective  management  of  any  process  requires  quan¬ 
tification,  measurement,  and  modeling.  Software 
metrics  provide  a  quantitative  basis  for  the  develop¬ 
ment  and  validation  of  models  of  the  software  devel¬ 
opment  process.  Metrics  can  be  used  to  improve 
software  productivity  and  quality.  This  module  in¬ 
troduces  the  most  commonly  used  software  metrics 
and  reviews  their  use  in  constructing  models  of  the 
software  development  process.  Although  current 
metrics  and  models  are  certainly  inadequate,  a  num¬ 
ber  of  organizations  are  achieving  promising  results 
through  their  use.  Results  should  improve  further  as 
we  gain  additional  experience  with  various  metrics 
and  models. 


ware  properties  to  be  measured.  As  a  result,  the 
same  metric  has  been  used  to  measure  very  different 
software  properties.  Moreover,  we  have  virtually  no 
theoretical  models  and  a  multitude  of  metrics,  only  a 
few  of  which  have  enjoyed  any  widespread  use  or 
acceptance. 

Faced  with  this  situation,  the  author  has  chosen  to 
indicate  the  great  diversity  of  metrics  that  have  been 
proposed  and  to  discuss  some  of  the  most  common 
ones  in  detail.  In  the  process,  the  underlying  as¬ 
sumptions,  environment  of  application,  and  validity 
of  various  metrics  are  examined.  The  author  be¬ 
lieves  that  current  metrics  and  models  are  far  from 
perfect,  but  that  properly  applied  metrics  and  models 
can  provide  significant  improvements  in  the  soft¬ 
ware  development  process. 


Philosophy 


It  has  been  noted  frequently  that  we  are  experiencing 
a  software  crisis,  characterized  by  our  inability  to 
produce  correct,  reliable  software  within  budget  and 
on  time.  No  doubt,  many  of  our  failures  are  caused 
by  the  inherent  complexity  of  the  software  develop¬ 
ment  process,  for  which  there  often  is  no  analytical 
description.  These  problems  can  be  ameliorated, 
however,  by  improving  our  software  management 
capabilities.  This  requires  both  the  development  of 
improved  software  metrics  and  improved  utilization 
of  such  metrics. 

Unfortunately,  the  current  state  of  software  metrics 
is  confused.  Many  metrics  have  been  invented. 
Most  of  these  have  been  defined  and  then  tested  only 
in  a  limited  environment,  if  at  all.  In  some  cases, 
remarkable  successes  have  been  reported  in  the  in¬ 
itial  application  or  validation  of  these  metrics.  How¬ 
ever,  subsequent  attempts  to  test  or  use  the  metrics 

•  in  other  situations  have  yielded  very  different 
results.  One  part  of  the  problem  is  that  we  have 
failed  to  identify  a  commonly  accepted  set  of  soft- 


Objectives 


The  following  is  a  list  of  possible  educational  objec¬ 
tives  based  upon  the  material  in  this  module.  Objec¬ 
tives  for  any  particular  unit  of  instruction  may  be 
drawn  from  these  or  related  objectives,  as  may  be 
appropriate  to  audience  and  circumstance.  (See 
Teaching  Considerations  for  further  suggestions.) 

Cognitive  Domain 

1 .  (Knowledge)  The  student  can  define  the 
basic  terminology  and  state  fundamental 
facts  about  software  metrics  and  process 
models.  (For  example,  identify  the 
metrics  and  models  that  have  been  pro¬ 
posed  and  used  by  significant  numbers  of 
people.) 

2.  (Comprehension)  The  student  can  ex¬ 
plain  the  metrics  and  models  discussed  in 
the  module  and  summarize  the  essential 
characteristics  of  each. 

3.  (Application)  The  student  can  calculate 
the  values  of  the  metrics  discussed  for 
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specific  examples  of  software  products 
or  processes.  fFor  example,  compute 
LOC  or  v(G)  for  specific  programs  or  ap¬ 
ply  the  COCOMO  model  to  the  develop¬ 
ment  process  for  a  specified  product.) 

4.  (Analysis)  The  student  can  identify  the 
essential  elements  of  a  given  metric  or 
model,  describe  the  interrelationships 
among  its  various  elements,  and  discuss 
the  circumstances  or  environments  in 
which  its  use  is  appropriate. 

5.  (Synthesis)  The  student  can  develop  a 
plan  for  a  metrics  program  for  a  software 
development  organization,  using  pre¬ 
scribed  metrics. 

6.  (Evaluation)  The  student  can  evaluate  a 
metrics  program  by  analyzing  the 
metrics  and  models  in  use  and  making 
judgments  concerning  their  application 
in  a  particular  environment. 

Affective  Domain 

1 .  The  student  will  realize  the  difficulty  and 
effort  involved  in  establishing  precise, 
reliable  software  metrics  and  models. 

2.  The  student  will  appreciate  the  impor¬ 
tance  of  software  metrics  in  the  control 
and  management  of  the  software  devel¬ 
opment  process. 

3.  The  student  will  be  more  likely  to  sup¬ 
port  implementation  and  use  of  appropri¬ 
ate  software  metrics. 


Prerequisite  Knowledge 


The  following  are  recommended  prerequisites  for 
the  study  of  software  metrics: 

1.  Knowledge  of  basic  statistics  and  experi¬ 
mental  design. 

2.  Basic  understanding  of  commonly  used 
software  life  cycle  models,  at  least  to  the 
level  covered  in  an  introductory  senior- 
or  graduate -level  software  engineering 
course 

3.  Experience  working  as  a  team  member 
on  a  software  development  project. 

The  reason  for  the  statistical  prerequisite  may  not  be 
immediately  obvious.  Exploring  and  validating  soft¬ 
ware  metrics  requires  sound  statistical  methods  and 
unbiased  experimental  designs.  The  student  needs  to 
understand  the  fundamentals  of  experiment  design, 


know  what  methods  are  available  for  data  analysis, 
and  be  able  to  select  appropriate  techniques  in  spe¬ 
cific  circumstances.  Furthermore,  the  student  needs 
to  understand  the  concept  of  statistical  significance 
and  how  to  test  for  it  in  the  analyses  usually  per¬ 
formed  to  validate  software  metrics.  Of  particular 
interest  are  various  correlation  techniques,  regres¬ 
sion  analysis,  and  statistical  tests  for  significance. 

The  need  for  familiarity  with  the  typical  software  de¬ 
velopment  cycle  and  experience  with  software  de¬ 
velopment  should  be  self-evident 

These  prerequisites  are,  in  the  author’s  view,  essen¬ 
tial  for  attaining  the  cognitive  objectives  listed 
above.  Prerequisites  for  any  particular  unit  of  in¬ 
struction,  of  course,  depend  upon  specific  teaching 
objectives.  (See  Teaching  Considerations.) 
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Module  Content 


Outline _ 

I.  Introduction 

1.  The  “Software  Crisis” 

2.  The  Need  for  Software  Metrics 

3.  Definition  of  Software  Metrics 

4.  Classification  of  Software  Metrics 

5.  Measurement  Scales  for  Software  Metrics 

6.  Current  State  of  Software  Metrics 

II.  Product  Metrics 

1.  Size  Metrics 

a.  Lines  of  Code 

b.  Function  Points 

c.  Bang 

2.  Complexity  Metrics 

a.  Cyclomatic  Complexity — v(G) 

b.  Extensions  to  v(G) 

c.  Knots 

d.  Information  Flow 

3.  Halstead’s  Product  Metrics 

a.  Program  Vocabulary 

b.  Program  Length 

c.  Program  Volume 

4.  Quality  Metrics 

a.  Defect  Metrics 

b.  Reliability  Metrics 

c.  Maintainability  Metrics 

III.  Process  Metrics,  Models,  and  Empirical 
Validation 

1.  General  Considerations 

2.  Empirical  Models 

3.  Statistical  Models 

4.  Theory-Based  Models 

a.  Rayleigh  Model 

b.  Software  Science  Model — Halstead 

5.  Composite  Models 

a.  COCOMO — Boehm 

b.  SOFTCCST — Tausworthe 

c.  SPQR  Model — Jones 

d.  COPMO— Thebaut 


e.  ESTIMACS— Rubin 

6.  Reliability  Models 

IV.  Implementation  of  a  Metrics  Program 

1.  Planning  Process 

a.  Defining  Objectives 

b.  Initial  Estimates  of  Effort  and  Cost 

2.  Selection  of  Model  and  Metrics 

a.  Projected  Ability  to  Meet  Objectives 

b.  Estimated  Data  Requirements  and  Cost 

3.  Data  Requirements  and  Database  Maintenance 

a.  Specific  Data  Required 

b.  Data  Gathering  Procedures 

c.  Database  Maintenance 

d.  Refined  Estimates  of  Efforts  and  Costs 

4.  Initial  Implementation  and  Use  of  the  Model 

a.  Clarification  of  Use 

b.  Responsible  Personnel 

5.  Continuing  Use  and  Refinement 

a.  Evaluating  Results 

b.  Adjusting  the  Model 

V.  Trends  in  Software  Metrics 


Annotated  Outline _ 

I.  Introduction 
1.  The  “Software  Crisis” 

It  has  been  estimated  that,  by  1990,  fully  one  half  of 
the  American  work  force  will  rely  on  computers  and 
software  to  do  its  daily  work.  As  computer  hard¬ 
ware  costs  continue  to  decline,  the  demand  for  new 
applications  software  continues  to  increase  at  a  rapid 
rate.  The  existing  inventory  of  software  continues  to 
grow,  and  the  effort  required  to  maintain  it  continues 
to  increase  as  well.  At  the  same  time,  there  is  a 
significant  shortage  of  qualified  software  profes¬ 
sionals.  Combining  these  factors,  one  might  project 
that  at  some  point  in  the  not-too-distant  future,  every 
American  worker  will  have  to  be  involved  in  soft¬ 
ware  development  and  maintenance.  Meanwhile, 
the  software  development  scene  is  often  charac¬ 
terized  by: 

•  schedule  and  cost  estimates  that  are  gross¬ 
ly  inaccurate, 
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•  software  of  poor  quality,  and 

•  a  productivity  rate  that  is  increasing  more 
slowly  than  the  demand  for  software. 

This  situation  has  often  been  referred  to  as  the 
“software  crisis”  [Arthur85]. 

2.  The  Need  for  Software  Metrics 

The  software  crisis  must  be  addressed  and,  to  the 
extent  possible,  resolved.  To  do  so  requires  more 
accurate  schedule  and  cost  estimates,  better  quality 
products,  and  higher  productivity.  All  these  can  be 
achieved  through  more  effective  software  manage¬ 
ment,  which,  in  turn,  can  be  facilitated  by  the  im¬ 
proved  use  of  software  metrics.  Current  software 
management  is  ineffective  because  software  devel¬ 
opment  is  extremely  complex,  and  we  have  few 
well-defined,  reliable  measures  of  either  the  process 
or  the  product  to  guide  and  evaluate  development. 
Thus,  accurate  and  effective  estimating,  planning, 
and  control  are  nearly  impossible  to  achieve  [Rubin- 
83],  Improvement  of  the  management  process  de¬ 
pends  upon  improved  ability  to  identify,  measure, 
and  control  essential  parameters  of  the  development 
process.  This  is  the  goal  of  software  metrics — the 
identification  and  measurement  of  the  essential 
parameters  that  affect  software  development. 

Software  metrics  and  models  have  been  proposed 
and  used  for  some  time  [Wolverton74,  Perlis81  ]. 
Metrics,  however,  have  rarely  been  used  in  any  reg¬ 
ular,  methodical  fashion  Recent  results  indicate 
that  the  conscientious  implementation  and  applica¬ 
tion  of  a  software  metrics  program  can  help  achieve 
better  management  results,  both  in  the  short  run  (for 
a  given  project)  and  in  the  long  run  (improving 
productivity  on  future  projects)  [Grady87],  Most 
software  metrics  cannot  meaningfully  be  discussed 
in  isolation  from  such  metrics  programs.  Better  use 
of  existing  metrics  and  development  of  improved 
metrics  appear  to  be  important  factors  in  the  resolu¬ 
tion  of  the  software  crisis. 

3.  Definition  of  Software  Metrics 

It  is  important  to  further  define  the  term  software 
metrics  as  used  in  this  module.  Essentially,  software 
metrics  deals  with  the  measurement  of  the  software 
product  and  the  process  by  which  it  is  developed.  In 
this  discussion,  the  software  product  should  be 
viewed  as  an  abstract  object  that  evolves  from  an 
initial  statement  of  need  to  a  finished  software  sys¬ 
tem,  including  source  and  object  code  and  the 
various  forms  of  documentation  produced  during  de¬ 
velopment  Ordinarily,  these  measurements  of  the 
software  process  and  product  are  studied  and  devel¬ 
oped  for  use  in  modeling  the  software  development 
process.  These  metrics  and  models  are  then  used  to 
estimate/predict  product  costs  and  schedules  and  to 
measure  productivity  and  product  quality.  Informa¬ 
tion  gained  from  the  metrics  and  the  model  can  then 


be  used  in  the  management  and  control  of  the  devel¬ 
opment  process,  leading,  one  hopes,  to  improved 
results. 

Good  metrics  should  facilitate  the  development  of 
models  that  are  capable  of  predicting  process  or 
product  parameters,  not  just  describing  them.  Thus, 
ideal  metrics  should  be : 

•  simple,  precisely  definable — so  that  it  is 
clear  how  the  metric  can  be  evaluated; 

•  objective,  to  the  greatest  extent  possible; 

•  easily  obtainable  (i.e.,  at  reasonable  cost); 

•  valid — the  metric  should  measure  what  it 
is  intended  to  measure;  and 

•  robust — relatively  insensitive  to  (intuitive¬ 
ly)  insignificant  changes  in  the  process  or 
product. 

In  addition,  for  maximum  utility  in  analytic  studies 
and  statistical  analyses,  metrics  should  have  data 
values  that  belong  to  appropriate  measurement 
scales  [Conte86,  Basili84], 

It  has  been  observed  that  the  fundamental  qualities 
required  of  any  technical  system  are  [Ferrari86]: 

•  functionality — correctness,  reliability,  etc.; 

•  performance — response  time,  throughput, 
speed,  etc.;  and 

•  economy — cost  effectiveness. 

So  far  as  this  author  can  discern,  software  metrics, 
as  the  term  is  most  commonly  used  today,  concerns 
itself  almost  exclusively  with  the  first  and  last  of  the 
above  characteristics,  i.e.,  functionality  and  econ¬ 
omy.  Performance  is  certainly  important,  but  it  is 
not  generally  included  in  discussions  of  software 
metrics,  except  regarding  whether  the  product  meets 
specific  performance  requirement  for  that  product. 
The  evaluation  of  performance  is  often  treated  ex¬ 
tensively  by  those  engaged  in  performance  evalua¬ 
tion  studies,  but  these  are  not  generally  included  in 
what  is  referred  to  as  software  metrics  [Ferrari86], 

It  is  possible  that,  in  the  future,  the  scope  of  software 
metrics  may  be  expanded  to  include  performance 
evaluation,  or  that  both  activities  may  be  considered 
part  of  a  larger  area  that  might  be  called  software 
measurement.  For  now,  however,  this  module  will 
confine  itself  to  software  metrics  as  defined  above. 

4.  Classification  of  Software  Metrics 

Software  metrics  may  be  broadly  classified  as  either 
product  metrics  or  process  metrics.  Product  metrics 
are  measures  of  the  software  product  at  any  stage  of 
its  development,  from  requirements  to  installed  sys¬ 
tem.  Product  metrics  may  measure  the  complexity 
of  the  software  design,  the  size  of  the  final  program 
(either  source  or  object  code),  or  the  number  of 
pages  of  documentation  produced.  Process  metrics, 
on  the  other  hand,  are  measures  of  the  software  de- 
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velopment  process,  such  as  overall  development 
time,  type  of  methodology  used,  or  the  average  level 
of  experience  of  the  programming  staff. 

In  addition  to  the  distinction  between  product  and 
process  metrics,  software  metrics  can  be  classified  in 
other  ways.  One  may  distinguish  objective  from 
subjective  properties  (metrics).  Generally  speaking, 
objective  metrics  should  always  result  in  identical 
values  for  a  given  metric,  as  measured  by  two  or 
more  qualified  observers.  For  subjective  metrics, 
even  qualified  observers  may  measure  different 
values  for  a  given  metric,  since  their  subjective 
judgment  is  involved  in  arriving  at  the  measured 
value.  For  product  metrics,  the  size  of  the  product 
measured  in  lines  of  code  (LOC)  is  an  objective 
measure,  for  which  any  informed  observer,  working 
from  the  same  definition  of  LOC,  should  obtain  the 
same  measured  value  for  a  given  program.  An  ex¬ 
ample  of  a  subjective  product  metric  is  the  classifi¬ 
cation  of  the  software  as  “organic,"  “semi-de¬ 
tached,”  or  “embedded,”  as  required  in  the  COCO- 
MO  cost  estimation  model  [Boehm81],  Although 
most  programs  might  be  easy  to  classify,  those  on 
the  borderline  between  categories  might  reasonably 
be  classified  in  different  ways  by  different  knowl 
edgeable  observers.  For  process  metrics,  develop¬ 
ment  time  is  an  example  of  an  objective  measure, 
and  level  of  programmer  experience  is  likely  to  be  a 
subjective  measure. 

Another  way  in  which  metrics  can  be  categorized  is 
as  primitive  metrics  or  computed  metrics  [Grady87], 
Primitive  metrics  are  those  that  can  be  directly  ob¬ 
served,  such  as  the  program  size  (in  LOC),  number 
of  defects  observed  in  unit  testing,  or  total  devel¬ 
opment  time  for  the  project  Computed  metrics  are 
those  that  cannot  be  directly  observed  but  are  com¬ 
puted  in  some  manner  from  other  metrics.  cx- 
amples  of  computed  metrics  are  those  commonly 
used  for  productivity,  such  as  LOC  produced  per 
person-month  (LOC/person-month),  or  for  product 
quality,  such  as  the  number  of  defects  per  thousand 
lines  of  code  (dcfccts/KLOC).  Computed  metrics 
are  combinations  of  other  metric  values  and  thus  are 
often  more  valuable  in  understanding  or  evaluating 
the  software  process  than  are  simple  metrics. 

Although  software  metrics  can  be  neatly  categorized 
as  primitive  objective  product  metrics,  primitive 
subjective  product  metrics,  etc.,  this  module  does 
not  strictly  follow  that  organization.  Rather,  the  dis¬ 
cussion  reflects  areas  where  most  of  the  published 
work  has  been  concentrated;  no  exhaustive  coverage 
of  all  possible  types  of  software  metrics  is  attempted 
here.  As  is  evident  below,  a  great  deal  of  work  has 
been  done  in  some  areas,  such  as  objective  product 
metrics,  and  much  less  in  other  areas,  such  as  sub¬ 
jective  product  metrics. 


5.  Measurement  Scales  for  Software  Metrics 

Software  metric  data  should  be  collected  with  a  spe¬ 
cific  purpose  in  mind.  Ordinarily,  the  purpose  is  for 
use  in  some  process  model,  and  this  may  invohv. 
using  the  data  in  other  calculations  or  subjecting 
them  to  statistical  analyses.  Before  data  are  col¬ 
lected  and  used,  it  is  important  to  consider  the  type 
of  information  involved.  Four  basic  types  of  meas¬ 
ured  data  are  recognized  by  statisticians — nominal, 
ordinal,  interval,  and  ratio.  (The  following  discus¬ 
sion  of  these  types  of  data  is  adapted  from 
[Conte86],  beginning  on  page  127.) 

The  four  basic  types  of  data  are  described  by  the 
following  table: 


Type  of  Data 

Possible 

Operations 

Description 
of  Data 

Nominal 

=  ,  * 

Categories 

Ordinal 

< ,  > 

Rankings 

Interval 

+  ,  - 

Differences 

Ratio 

/ 

Absolute  zero 

Operations  in  this  table  for  a  given  data  type  also 
apply  to  all  data  types  appearing  below  it. 

Examples  of  software  metrics  can  be  found  for  each 
type  of  data. 

As  an  example  of  nominal  data ,  one  can  measure  the 
type  of  program  being  produced  by  placing  it  in  to  a 
category  of  some  kind — database  program,  operat¬ 
ing  system,  etc.  For  such  data,  we  cannot  perform 
arithmetic  operations  of  any  type  or  even  rank  the 
possible  values  in  any  “natural  order.”  The  only  pos¬ 
sible  operation  is  to  determine  whether  program  A  is 
of  the  same  type  as  program  B.  Such  data  are  said 
to  have  a  nominal  scale,  and  the  particular  example 
given  can  be  an  important  parameter  in  a  model  of 
the  software  development  process.  The  data  might 
be  considered  either  subjective  or  objective,  depend¬ 
ing  upon  whether  the  rules  for  classification  allow 
equally  qualified  observers  to  arrive  at  different 
classifications  for  a  given  program. 

Ordinal  data,  by  contrast,  allow  us  to  rank  the 
various  data  values,  although  differences  or  ratios 
between  values  are  not  meaningful.  For  example, 
programmer  experience  level  may  be  measured  as 
low,  medium,  or  high.  (In  order  for  this  to  be  an 
objective  metric,  one  must  assume  that  the  criteria 
for  placement  in  the  various  categories  are  well- 
defined,  so  that  different  observers  always  assign  the 
same  value  to  any  given  programmer.) 

Data  from  an  interval  scale  can  not  only  be  ranked, 
but  also  can  exhibit  meaningful  differences  between 
values.  McCabe’s  complexity  measure  [McCabe76] 
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might  be  interpreted  as  having  an  interval  scale. 
Differences  appear  to  be  meaningful;  but  there  is  no 
absolute  zero,  and  ratios  of  values  are  not  neces¬ 
sarily  meaningful.  For  example,  a  program  with 
complexity  value  of  6  is  4  units  more  complex  than 
a  program  with  complexity  of  2,  but  it  is  probably 
not  meaningful  to  say  that  the  first  program  is  three 
times  as  complex  as  the  second. 

Some  data  values  are  associated  with  a  ratio  scale, 
which  possesses  an  absolute  zero  and  allows  mean¬ 
ingful  ratios  to  be  calculated.  An  example  is  pro¬ 
gram  size,  in  lines  of  code  (LOC).  A  program  of 
2,000  lines  can  reasonably  be  interpreted  as  being 
twice  as  large  as  a  program  of  1 ,000  lines,  and  pro¬ 
grams  can  obviously  have  zero  length  according  to 
this  measure. 

It  is  important  to  be  aware  of  what  measurement 
scale  is  associated  with  a  given  metric.  Many  pro¬ 
posed  metrics  have  values  from  an  interval,  ordinal, 
or  even  nominal  scale.  If  the  metric  values  are  to  be 
used  in  mathematical  equations  designed  to  repre¬ 
sent  a  model  of  the  software  process,  metrics  associ¬ 
ated  with  a  ratio  scale  may  be  preferred,  since  ratio 
scale  data  allow  most  mathematical  operations  to  be 
meaningfully  applied.  However,  it  seems  clear  that 
the  values  of  many  parameters  essential  to  the  soft¬ 
ware  development  process  cannot  be  associated  with 
a  ratio  scale,  given  our  present  state  of  knowledge. 
This  is  seen,  for  example,  in  the  categories  of 
COCOMO. 

6.  Current  State  of  Software  Metrics 

The  current  state  of  software  metrics  is  not  very  sat¬ 
isfying  In  the  past,  many  metrics  and  a  number  of 
process  models  have  been  proposed  [Mohanty81, 
Kafura85,  Kemerer87,  Rubin87],  Unfortunately, 
most  of  the  metrics  defined  have  lacked  one  or  both 
of  two  important  characteristics  : 

•  a  sound  conceptual,  theorcucal  basis 

•  stausucally  significant  experimental  vali¬ 
dation 

Most  metrics  have  been  defined  by  an  individual  and 
then  tested  and  used  only  in  a  very  limited  environ¬ 
ment.  In  some  cases,  significant  successes  have 
been  reported  in  the  validation  or  application  of 
these  metrics.  However,  subsequent  attempts  to  test 
or  use  the  metrics  in  other  environments  have 
yielded  very  different  results.  These  differences  are 
not  surprising  in  view  of  the  lack  of  clear  definitions 
and  testable  hypotheses.  Nevertheless,  discrep¬ 
ancies  and  disagreements  in  reported  results  have 
left  many  observers  with  the  sense  that  the  field  of 
software  metrics  is,  at  best,  insufficiently  mature  to 
be  of  any  practical  use. 

The  metrics  field  has  no  clearly  defined,  commonly 
accepted  set  of  essential  software  properties  it  at¬ 
tempts  to  measure;  however,  it  does  have  a  large 


number  of  metrics,  only  a  few  of  which  have  en¬ 
joyed  any  widespread  use  or  acceptance.  Even  in 
the  case  of  widely  studied  metrics,  such  as  LOC, 
Halstead’s  metrics,  and  McCabe’s  cyclomatic  com¬ 
plexity,  it  is  not  universally  agreed  what  they  meas¬ 
ure.  In  various  reported  studies,  attempts  have  been 
made  to  correlate  these  metrics  with  a  number  of 
software  properties,  including  size,  complexity, 
reliability  (error  rates),  and  maintainability 
[Curtis79a,  Curtis79b,  Kafura85,  Li87,  Potier82, 
Wood1ield81],  Thus,  it  is  little  wonder  that  software 
practitioners  are  wary  of  any  claims  on  behalf  of 
software  metrics. 

Many  apparently  important  software  metrics,  such 
as  type  of  product  or  level  of  programming  exper¬ 
tise,  must  be  considered  subjective  metrics  at  this 
time,  although  they  may  be  defined  more  objectively 
in  the  future.  These  metrics  are  difficult  to  construct 
because  of  the  potentially  large  number  of  factors 
involved  and  the  problems  associated  with  assessing 
or  quantifying  individual  factors.  As  a  result,  little 
definitive  work  has  been  done  to  reduce  the  uncer¬ 
tainty  associated  with  these  metrics. 

As  for  the  proposed  process  models,  few  of  these 
have  a  significant  theoretical  basis.  Most  are  based 
upon  a  combination  of  intuition,  expert  judgment, 
and  statistical  analysis  of  empirical  data.  Overall, 
the  work  has  failed  to  produce  any  single  process 
model  that  can  be  applied  with  a  reasonable  degree 
of  success  to  a  variety  of  environments.  Generally, 
significant  recalibration  is  required  for  each  new  en¬ 
vironment  in  order  to  produce  useful  results.  Fur¬ 
thermore,  the  various  models  often  use  widely  dif¬ 
ferent  sets  of  basic  parameters.  Thus,  even  a  rela¬ 
tively  small  set  of  universally  useful  metrics  has  not 
yet  emerged. 

As  a  result  of  the  above  considerations,  it  is  very 
difficult  to  interpret  and  compare  quoted  metric 
results,  especially  if  they  involve  different  environ¬ 
ments,  languages,  applications,  or  development 
methodologies.  Even  with  an  apparently  simple 
metric,  such  as  LOC,  differences  in  underlying 
definitions  and  counting  techniques  may  make  it  im¬ 
possible  to  compare  quoted  results  [Jones86],  If  dif¬ 
ferent  programming  languages  are  involved,  metrics 
involving  LOC  values  can,  if  not  carefully  inter¬ 
preted,  lead  to  incorrect  conclusions  and  thereby 
conceal  the  real  significance  of  the  data.  For  ex¬ 
ample,  the  (computed)  productivity  metric  LOC  per 
unit-time  (LOC/month,  for  example)  and  cost  per 
LOC  (S/LOC)  are  often  used.  However,  if  they  are 
not  interpreted  carefully,  these  metrics  can  suggest 
that  assembly  language  programmers  are  more  pro¬ 
ductive  than  high-level  language  programmers 
(higher  LOC/month  and  lower  S/LOC),  even  though 
the  total  programming  cost  is  usually  lower  when 
using  a  high-level  language.  Similarly,  defects  per 
LOC  and  cost  per  defect  values  have  often  been 
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used  as  quality  or  productivity  indicators.  As  in  the 
above  case,  when  programming  languages  at  differ¬ 
ent  levels  are  involved,  these  metrics  may  obscure 
overall  productivity  and  quality  improvements  by 
systematically  yielding  lower  defect  per  LOC  and 
cost  per  defect  values  for  lower-level  languages, 
even  though  total  defects  and  costs  are  actually 
higher. 

Despite  these  problems,  it  appears  that  the  judicious, 
methodical  application  of  software  metrics  and 
models  in  limited  environments  can  aid  significantly 
in  improving  software  quality  and  productivity 
[Basili87,  Grady87],  In  many  cases,  relatively  simple 
metrics  such  as  LOC  and  McCabe’s  complexity 
metric,  v(G),  have  been  found  to  be  reasonably  good 
predictors  of  other  characteristics,  such  as  defect 
counts,  total  effort,  and  maintainability  [Grady87, 
Li87,  Rombach87],  Thus,  although  useful  metrics 
and  models  cannot  yet  be  pulled  off  the  shelf  and 
used  indiscriminately,  careful  application  of  some  of 
the  metrics  and  models  already  available  can  yield 
useful  results  if  tuned  to  a  particular  environment. 
These  results  will  improve  further  as  we  gain  addi¬ 
tional  experience  with  current  models  and  achieve 
better  understanding  of  the  underlying  metrics  and 
their  application  to  the  software  process. 

II.  Product  Metrics 

Most  of  the  initial  work  in  product  metrics  dealt  with 
the  characteristics  of  source  code.  As  we  have  gained 
experience  with  metrics  and  models,  it  has  become  in¬ 
creasingly  apparent  that  metric  information  available 
earlier  in  the  development  cycle  can  be  of  greater  value 
in  controlling  the  process  and  results.  Thus,  for  ex¬ 
ample,  a  number  of  papers  have  dealt  with  the  size  or 
complexity  of  the  software  design  [Troy81,  Henry84, 
Yau85],  More  recently.  Card  and  Agresti  have  devised 
a  metric  for  architectural  design  complexity  and  com¬ 
pared  it  with  subjective  judgments  and  objective  error 
rates  [Card88], 

A  number  of  product  metrics  are  discussed  below. 
These  examples  were  chosen  because  of  their  wide  use 
or  because  they  represent  a  particularly  interesting 
point  of  view.  No  attempt  has  been  made  in  this  mod¬ 
ule  to  provide  examples  of  metrics  applicable  to  each 
work  product  of  the  software  development  cycle. 
Rather,  the  examples  discussed  reflect  the  areas  where 
most  work  on  product  metrics  has  been  done.  Ref¬ 
erences  have  been  provided  for  readers  who  are  inter¬ 
ested  in  pursuing  specialized  metrics. 

1.  Size  Metrics 

A  number  of  metrics  attempt  to  quantify  software 
“size.”  The  metric  that  is  most  widely  used,  LOC, 
suffers  from  the  obvious  deficiency  that  its  value 
cannot  be  measured  until  after  the  coding  process 
has  been  completed.  Function  points  and  system 
Bang  have  the  advantage  of  being  measurable  earlier 


in  the  development  process — at  least  as  early  as  the 
design  phase,  and  possibly  earlier.  Some  of  Hal¬ 
stead’s  metrics  are  also  used  to  measure  software 
size,  but  these  are  discussed  later. 

a.  Lines  of  Code 

Lines  of  code  (or  LOC)  is  possibly  the  most 
widely  used  metric  for  program  size.  It  would 
seem  to  be  easily  and  precisely  definable;  how¬ 
ever,  there  are  a  number  of  different  definitions 
for  the  number  of  lines  of  code  in  a  particular 
program.  These  differences  involve  treatment  of 
blank  lines  and  comment  lines,  non-executable 
statements,  multiple  statements  per  line,  and  mul¬ 
tiple  lines  per  statement,  as  well  as  the  question  of 
how  to  count  reused  lines  of  code.  The  most 
common  definition  of  LOC  seems  to  count  any 
line  that  is  not  a  blank  or  comment  line,  regard¬ 
less  of  the  number  of  statements  per  line  [Boehm- 
81 ,  Jones86). 

LOC  has  been  theorized  to  be  useful  as  a  predic¬ 
tor  of  program  complexity,  total  development  ef¬ 
fort,  and  programmer  performance  (debugging, 
productivity).  Numerous  studies  have  attempted 
to  validate  these  relationships.  Examples  are 
those  of  [WoodfieldSI]  comparing  LOC,  Mc¬ 
Cabe’s  v(G),  and  Halstead’s  E  as  indicators  of 
programming  effort  and  [Curtis79a]  and  [Curtis- 
79b]  comparing  LOC  with  other  metrics,  as  in¬ 
dicators  of  programmer  performance. 

In  a  recent  study,  Levitin  concludes  that  LOC  is  a 
poorer  measure  of  size  than  Halstead’s  program 
length,  iV,  discussed  below  [Levitin86]. 

b.  Function  Points 

Albrecht  has  proposed  a  measure  of  software  size 
that  can  be  determined  early  in  the  development 
process.  The  approach  is  to  to  compute  the  total 
function  points  (FP)  value  for  the  project,  based 
upon  the  number  of  external  user  inputs,  in¬ 
quiries,  outputs,  and  master  files.  The  value  of 
FP  is  the  total  of  these  individual  values,  with  the 
following  weights  applied:  inputs:  4,  outputs:  5, 
inquiries:  4,  and  master  files:  10.  Each  FP  con¬ 
tributor  can  also  be  adjusted  within  a  range  of 
±  35%  for  specific  project  complexity  [Albrecht- 
83],  Function  points  are  intended  to  be  a  measure 
of  program  size  and,  thus,  effort  required  for  de¬ 
velopment  Examples  of  studies  that  validate  this 
metric  are  those  of  Albrecht  [Albrecht83]  (com¬ 
paring  LOC  and  FP  as  predictors  of  development 
effort)  and  Behrens  [Behrens83]  (attempting  to 
correlate  FP  values  with  productivity  and  devel¬ 
opment  effort  in  a  production  environment).  A 
more  recent  study  has  been  reported  by  Knafl  and 
Sacks  [Knaf  186], 
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c.  Bang 

DeMarco  defines  system  Bang  as  a  function 
metric,  indicative  of  the  size  of  the  system.  In 
effect,  it  measures  the  total  functionality  of  the 
software  system  delivered  to  the  user.  Bang  can 
be  calculated  from  certain  algorithm  and  data 
primitives  available  from  a  set  of  formal  specifi¬ 
cations  for  the  software.  The  model  provides  dif¬ 
ferent  formulas  and  criteria  for  distinguishing  be¬ 
tween  complex  algorithmic  versus  heavily  data- 
oriented  systems.  Since  Bang  measures  the  func¬ 
tionality  delivered  to  the  user,  DeMarco  suggests 
that  a  reasonable  project  goal  is  to  maximize 
“ Bang  per  Buck” — Bang  divided  by  the  total 
project  cost  [DeMarco82]. 

2.  Complexity  Metrics 

Numerous  metrics  have  been  proposed  for  measur¬ 
ing  program  complexity — probably  more  than  for 
any  other  program  characteristic.  The  examples  dis¬ 
cussed  below  are  some  of  the  better  known  com¬ 
plexity  metrics.  A  recent  study  by  Li  and  Cheung 
compares  31  different  complexity  metrics,  including 
most  of  those  discussed  below  [U87],  Another  re¬ 
cent  study  by  Rodriguez  and  Tsai  compares  LOC, 
v(G),  Kafura’s  information  flow  metric,  and  Hal¬ 
stead’s  volume,  V ,  as  measures  of  program  size, 
complexity,  and  quality  [Rodriguez86],  Attempts  to 
devise  new  measures  of  software  complexity  con¬ 
tinue,  as  evidenced  by  recent  articles  [Card88,  Har- 
rison87]. 

As  noted  for  size  metrics,  measures  of  complexity 
that  can  be  computed  early  in  the  software  devel¬ 
opment  cycle  will  be  of  greater  value  in  managing 
the  software  process.  Theoretically,  McCabe’s 
measure  [McCabe76]  is  based  on  the  final  form  of 
the  computer  code.  However,  if  the  detailed  design 
is  specified  in  a  program  design  language  (PDL),  it 
should  be  possible  to  compute  v(G)  from  that  de¬ 
tailed  design.  This  is  also  true  for  the  information 
flow  metric  of  Kafura  and  Henry  [Kafura81],  It 
should  be  noted  here  that  Halstead’s  metrics 
[Halstead77]  are  often  studied  as  possible  measures 
of  software  complexity. 

a.  Cyclomatic  Complexity — v(G) 

Given  any  computer  program,  we  can  draw  its 
control  flow  graph,  G,  wherein  each  node  cor¬ 
responds  to  a  block  of  sequential  code  and  each 
arc  f  orresponds  to  a  branch  or  decision  point  in 
the  program.  The  cyclomatic  complexity  of  such 
a  graph  can  be  computed  by  a  simple  formula 
from  graph  theory,  as  v(G)  =  e-n  +  2,  where  e  is 
the  number  of  edges,  and  n  is  the  number  of 
nodes  in  the  graph.  McCabe  proposed  that  v(G) 
can  be  used  as  a  measure  of  program  complexity 
and,  hence,  as  a  guide  to  program  development 
and  testing.  For  structured  programs,  v(G)  can  be 


computed  without  reference  to  the  program  flow 
graph  by  using  only  the  number  of  decision  points 
in  the  program  text  [McCabe76],  McCabe’s 
cyclomatic  complexity  metric  has  been  related  to 
programming  effort,  debugging  performance,  and 
maintenance  effort.  The  studies  by  Curtis  and 
Woodfield  referenced  earlier  also  report  results 
for  this  metric  [Curtis79b,  Woodfield81,  Harrison- 
82], 

b.  Extensions  to  v(G) 

Myers  noted  that  McCabe’s  cyclomatic  com¬ 
plexity  measure,  v(G),  provides  a  measure  of  pro¬ 
gram  complexity  but  fails  to  differentiate  the 
complexity  of  some  rather  simple  cases  involving 
single  conditions  (as  opposed  to  multiple  con¬ 
ditions)  in  conditional  statements.  As  an  im¬ 
provement  to  the  original  formula,  Myers  sug¬ 
gests  extending  v(G)  to  v'(G)  =  [/:«],  where  l  and 
u  are  lower  and  upper  bounds,  respectively,  for 
the  complexity.  This  formula  gives  more  satis¬ 
factory  results  for  the  cases  noted  by  Myers 
[Myers77], 

S  tetter  proposed  that  the  program  flow  graph  be 
expanded  to  include  data  declarations  and  data 
references,  thus  allowing  the  graph  to  depict  the 
program  complexity  more  completely.  If  H  is  the 
new  program  flow  graph,  it  will  generally  contain 
multiple  entry  and  exit  nodes.  A  function  f(H) 
can  be  computed  as  a  measure  of  the  flow 
complexity  of  program  H.  The  deficiencies  noted 
by  Myers  are  also  eliminated  by  /(//)  [Stetter84], 

c.  Knots 

The  concept  of  program  knots  is  related  to  draw¬ 
ing  the  program  control  flow  graph  with  a  node 
for  every  statement  or  block  of  sequential  state¬ 
ments.  A  knot  is  then  defined  as  a  necessary 
crossing  of  directional  lines  in  the  graph.  The 
same  phenomenon  can  also  be  observed  by 
simply  drawing  transfer-of-control  lines  from 
statement  to  statement  in  a  program  listing.  The 
number  of  knots  in  a  program  has  been  proposed 
as  a  measure  of  program  complexity  [Wood- 
ward79]. 

d.  Information  Flow 

The  information  flow  within  a  program  structure 
may  also  be  used  as  a  metric  for  program  com¬ 
plexity.  Kafura  and  Henry  have  proposed  such  a 
measure.  Basically,  their  method  counts  the  num¬ 
ber  of  local  information  flows  entering  (fan-in) 
and  exiting  (fan-out)  each  procedure.  The  pro¬ 
cedure’s  complexity  is  then  defined  as: 

c  =  [  procedure  length ]  ■  [  fan-in  fan-out ]2 

[Kafura81],  This  information  flow  metric  is  com¬ 
pared  with  Halstead’s  E  metric  and  McCabe’s 
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cyclomatic  complexity  in  [Henry81],  Complexity 
metrics  such  as  v(G)  and  Kafura’s  information 
flow  metric  have  been  shown  by  Rombach  to  be 
useful  measures  of  program  maintainability  [Rom- 
bach87], 

3.  Halstead’s  Product  Metrics 


Most  of  the  product  metrics  proposed  have  applied 
to  only  one  particular  aspect  of  the  software  product. 
In  contrast,  Halstead’s  software  science  proposed  a 
unified  set  of  metrics  that  apply  to  several  aspects  of 
programs,  as  well  as  to  the  overall  software  produc¬ 
tion  effort.  Thus,  it  is  the  first  set  of  software 
metrics  unified  by  a  common  theoretical  basis.  In 
this  section,  we  discuss  the  program  vocabulary  (n), 
length  (AO,  and  volume  (V)  metrics.  These  metrics 
apply  specifically  to  the  final  software  product. 
Halstead  also  specified  formulas  for  computing  the 
total  effort  (£)  and  development  time  (T)  for  the 
software  product.  These  metrics  are  discussed  in 
Section  III. 


a.  Program  Vocabulary 

Halstead  theorized  that  computer  programs  can  be 
visualized  as  a  sequence  of  tokens,  each  token 
being  classified  as  either  an  operator  or  operand. 
He  then  defined  the  vocabulary ,  n,  of  the  program 
as: 


where  n{  =  the  number  of  unique  operators  in 
the  program  and 

n2=  the  number  of  unique  operands  in 
the  program. 

Thus,  n  is  the  total  number  of  unique  tokens  from 
which  the  program  has  been  constructed  [Hal- 
stead77]. 


b.  Program  Length 

Having  identified  the  basic  tokens  used  to  con¬ 
struct  the  program,  Halstead  then  defined  'he  pro¬ 
gram  length,  N,  as  the  count  of  the  total  number 
of  operators  and  operands  in  the  program. 
Specifically: 

N  =  N]+N2  , 

where  jVj  =  the  total  number  of  operators  in  the 
program  and 

N2  =  the  total  number  of  operands  in  the 
program. 


Thus,  N  is  clearly  a  measure  of  the  program’s 
size,  and  one  that  is  derivable  directly  from  the 
program  itself.  In  practice,  however,  the  distinc¬ 
tion  between  operators  and  operands  may  be  non¬ 
trivial,  thus  complicating  the  counting  process 
[Halstead77], 


Halstead  theorized  that  an  estimated  value  for  N, 
designated  N can  be  calculated  from  the  values 
of  n,  and  n2  by  using  the  following  formula: 

N'  =  n^log2nx+n2log2n2  . 

Thus,  N  is  a  primitive  metric,  directly  observable 
from  the  finished  program,  while  N’  is  a  com¬ 
puted  metric,  which  can  be  calculated  from  the 
actual  or  estimated  values  of  and  before  the 
final  code  is  actually  produced.  A  number  of 
studies  lend  empirical  support  to  the  validity  of 
the  equation  for  the  computed  program  length, 
N'.  Examples  are  reported  by  Elshoff  and  can 
also  be  found  in  Halstead’s  book.  Other  studies 
have  attempted  to  relate  N  and  N'  to  other  soft¬ 
ware  properties,  such  as  complexity  [Potier82]  and 
defect  rates  [Elshoff76,  Halstead77,  Levitin86, 
Li87,  Shen85], 

c.  Program  Volume 

Another  measure  of  program  size  is  the  program 
volume,  V,  which  was  defined  by  Halstead  as: 

V  =  N-log2n  . 

Since  N  is  a  pure  number,  the  units  of  V  can  be 
interpreted  as  bits,  so  that  V  is  a  measure  of  the 
storage  volume  required  to  represent  the  program. 
Empirical  studies  by  Halstead  and  others  have 
shown  that  the  values  of  LOC,  N,  and  V  appear  to 
be  linearly  related  and  equally  valid  as  relative 
measures  of  program  size  [Christensen81 ,  Elshoff- 
78,  Li87]. 

4.  Quality  Metrics 

One  can  generate  long  lists  of  quality  characteristics 
for  software — correctness,  efficiency,  portability, 
maintainability,  reliability,  etc.  Early  examples  of 
work  on  quality  metrics  are  discussed  by  Boehm, 
McCall,  and  others  [Boehm76,  McCall77],  Unfor¬ 
tunately,  the  characteristics  often  overlap  and  con¬ 
flict  with  one  another;  for  example,  increased  por¬ 
tability  (desirable)  may  result  in  lowered  efficiency 
(undesirable).  Thus,  useful  definitions  of  general 
quality  metrics  are  difficult  to  devise,  and  most  com¬ 
puter  scientists  have  abandoned  efforts  to  find  any 
single  metric  for  overall  software  quality. 

Although  a  good  deal  of  work  has  been  done  in  this 
area,  it  exhibits  less  commonality  of  direction  or  de¬ 
finition  than  other  areas  of  metric  research,  such  as 
software  size  or  complexity.  Three  areas  that  have 
received  considerable  attention  are:  program  correct¬ 
ness,  as  measured  by  defect  counts;  software  reli¬ 
ability,  as  computed  from  defect  data;  and  software 
maintainability,  as  measured  by  various  other 
metrics,  including  complexity  metrics.  Examples 
from  these  areas  are  discussed  briefly  below. 

Software  quality  is  a  characteristic  that,  theoretically 
at  least,  can  be  measured  at  every  phase  of  the  soft- 
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ware  development  cycle.  Cerino  discusses  the 
measurement  of  quality  at  some  of  these  phases  in 
[Cerino86], 

a.  Defect  Metrics 

The  number  of  defects  in  the  software  product 
should  be  readily  derivable  from  the  product  it¬ 
self;  thus,  it  qualifies  as  a  product  metric.  How¬ 
ever,  since  there  is  no  effective  procedure  for 
counting  the  defects  in  the  program,  the  following 
alternative  measures  have  been  proposed  ; 

•  number  of  design  changes 

•  number  of  errors  detected  by  code  in¬ 
spections 

•  number  of  errors  detected  in  program 
tests 

•  number  of  code  changes  required 

These  alternative  measures  are  dependent  upon 
both  the  program  and  the  outcome  or  result  of 
some  phase  of  the  development  cycle. 

The  number  of  defects  observed  in  a  software 
product  provides,  in  itself,  a  metric  of  software 
quality.  Studies  have  attempted  to  establish 
relationships  between  this  and  other  metrics  that 
might  be  available  earlier  in  the  development  cy¬ 
cle  and  that  might,  therefore,  be  useful  as  predic¬ 
tors  of  program  quality  [Curtis79b,  Potier82, 
Shen85,  Rodriguez86], 

b.  Reliability  Metrics 

It  would  be  useful  to  know  the  probability  of  soft¬ 
ware  failure,  or  the  rate  at  which  software  errors 
will  occur.  Again,  although  this  information  is 
inherent  in  the  software  product,  it  can  only  be 
estimated  from  data  collected  on  software  defects 
as  a  function  of  time.  If  certain  assumptions  are 
made,  these  data  can  then  be  used  to  model  and 
compute  software  reliability  metrics.  These 
metrics  attempt  to  measure  and  predict  the  proba¬ 
bility  of  failure  during  a  particular  time  interval, 
or  the  mean  time  to  failure  (MTTF).  Since  these 
metrics  are  usually  discussed  in  the  context  of  de¬ 
veloping  a  reliability  model  of  the  software 
product’s  behavior,  more  detailed  discussion  of 
this  model  is  deferred  to  the  section  on  process 
models.  Significant  references  in  this  area  are 
[Ruston79],  [  Musa75],  and  [Musa87]. 

c.  Maintainability  Metrics 

A  number  of  efforts  have  been  made  to  define 
metrics  that  can  be  used  to  measure  or  predict  the 
maintainability  of  the  software  product  [Yau80, 
Yau85].  For  example,  an  early  study  by  Curtis,  et 
a!.,  investigated  the  ability  of  Halstead’s  effort 
metric,  £,  and  v(G)  to  predict  the  psychological 
complexity  of  software  maintenance  tasks  [Curtis- 
79a],  Assuming  such  predictions  could  be  made 


accurately,  complexity  metrics  could  then  be 
profitably  used  to  reduce  the  cost  of  software 
maintenance  [Harrison  82].  More  recently,  Rom- 
bach  has  published  the  results  of  a  carefully  de¬ 
signed  experiment  that  indicates  that  software 
complexity  metrics  can  be  used  effectively  to  ex¬ 
plain  or  predict  the  maintainability  of  software  in 
a  distributed  computer  system  [Rombach87],  A 
similar  study,  based  on  three  different  versions  of 
a  medium-sized  software  system  that  evolved 
over  a  period  of  three  years,  relates  seven  differ¬ 
ent  complexity  metrics  to  the  recorded  experience 
with  maintenance  activities  [Kafura8 7],  The  com¬ 
plexity  metrics  studied  included  both  measures  of 
the  internal  complexity  of  software  modules  and 
measures  of  the  complexity  of  interrelationships 
between  software  modules.  The  study  indicates 
that  such  metrics  can  be  quite  useful  in  measuring 
maintainability  and  in  directing  design  or  rede¬ 
sign  activities  to  improve  software  maintainabil¬ 
ity. 

III.  Process  Metrics,  Models,  and  Empirical 
Validation 

1.  General  Considerations 

Software  metrics  may  be  defined  without  specific 
reference  to  a  well-defined  model,  as,  for  example, 
the  metric  LOC  for  program  size.  However,  more 
often  metrics  are  defined  or  used  in  conjunction  with 
a  particular  model  of  the  software  development 
process.  In  this  curriculum  module,  the  intent  is  to 
focus  on  those  metrics  that  can  best  be  used  in 
models  to  predict,  plan,  and  control  software  devel¬ 
opment,  thereby  improving  our  ability  to  manage  the 
process. 

Models  of  various  types  are  simply  abstractions  of 
the  product  or  process  we  are  interested  in  describ¬ 
ing.  Effective  models  allow  us  to  ignore  uninterest¬ 
ing  details  and  concentrate  on  essential  aspects  of 
the  artifact  described  by  the  model.  Preference 
should  be  given  to  the  simplest  model  that  provides 
adequate  descriptive  capability  and  some  measure  of 
intuitive  acceptability.  A  good  model  should  pos¬ 
sess  predictive  capabilities,  rather  than  being  merely 
descriptive  or  explanatory. 

In  general,  models  may  be  analytic-constructive  or 
empirical-descriptive  in  nature.  There  have  been 
few  analytic  models  of  the  software  process,  the 
most  notable  exception  being  Halstead’s  software 
science,  which  has  received  mixed  reactions.  Most 
proposed  software  models  have  resulted  from  a 
combination  of  intuition  about  the  basic  form  of 
relationships  and  the  use  of  empirical  data  to  deter¬ 
mine  the  specific  quantities  involved  (the  coef¬ 
ficients  of  independent  variables  in  hypothesized 
equations,  for  example). 

Ultimately,  the  validity  of  software  metrics  and 
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models  must  be  established  by  demonstrated  agree¬ 
ment  with  empirical  or  experimental  data.  This  re¬ 
quires  careful  attention  to  taking  measurements  and 
analyzing  data.  In  general,  the  work  of  analyzing 
and  validating  software  metrics  and  models  requires 
both  sound  statistical  methods  and  sound  experimen¬ 
tal  designs.  Precise  definitions  of  the  metrics  in¬ 
volved  and  the  procedures  for  collecting  the  data  are 
essential  for  meaningful  results.  Small-scale  experi¬ 
ments  should  be  designed  carefully,  using  well- 
established  principles  of  experimental  design.  Un¬ 
fortunately,  validation  of  process  models  involving 
larger  projects  must  utilize  whatever  data  can  be  col¬ 
lected.  Carefully  controlled  large  experiments  are 
virtually  impossible  to  conduct.  Guidance  in  the 
area  of  data  collection  for  software  engineering  ex¬ 
periments  is  provided  by  Basili  and  Weiss  [Basili84], 

A  knowledge  of  basic  statistical  theory  is  essential 
for  conducting  meaningful  experiments  and  analyz¬ 
ing  the  resulting  data.  In  attempting  to  validate  the 
relationships  of  a  given  model,  one  must  use  appro¬ 
priate  statistical  procedures  and  be  careful  to  inter¬ 
pret  the  results  objectively.  Most  studies  of  software 
metrics  have  used  some  form  of  statistical  correla¬ 
tion,  often  without  proper  regard  for  the  theoretical 
basis  or  limitations  of  the  methods  used.  In  practice, 
software  engineers  lacking  significant  background  in 
statistical  methods  should  consider  enlisting  the  aid 
of  a  statistical  consultant  if  serious  metric  evaluation 
work  is  undertaken. 

Representative  examples  of  software  models  are 
presented  below.  For  papers  that  compare  the 
various  models,  see  [Kemerer87]  and  [Rubin87], 

Teaching  Consideration:  In  any  given  unit  of  instruction 
based  on  this  module,  at  least  one  or  two  examples  of  each 
type  of  model  should  be  covered  in  some  detail. 

2.  Empirical  Models 

One  of  the  earliest  models  used  to  project  the  cost  of 
large-scale  software  projects  was  described  by  Wol- 
verton  of  TRW  in  1974.  The  method  relates  a  pro¬ 
posed  project  to  similar  projects  for  which  historical 
cost  data  are  available.  It  is  assumed  that  the  cost  of 
the  new  project  can  be  projected  using  this  historical 
data.  The  method  assumes  that  a  waterfall-style  life 
cycle  model  is  used.  A  25  x  7  structural  forecast 
matrix  is  used  to  allocate  resources  to  various  phases 
of  the  life  cycle.  In  order  to  determine  actual  soft¬ 
ware  costs,  each  software  module  is  first  classified 
as  belonging  to  one  of  six  basic  types — control,  I/O, 
etc.  Then,  a  level  of  difficulty  is  assigned  by 
categorizing  the  module  as  new  or  old  and  as  easy, 
medium,  or  hard.  This  gives  a  total  of  six  levels  of 
module  difficulty.  Finally,  the  size  of  the  module  is 
estimated,  and  the  system  cost  is  determined  from 
historical  cost  data  for  software  with  similar  size, 
type,  and  difficulty  ratings  [Wolverton74], 


3.  Statistical  Models 

C.  E.  Walston  and  C.  P.  Felix  of  IBM  used  data 
from  60  previous  software  projects  completed  by  the 
Federal  Systems  Division  to  develop  a  simple  model 
of  software  development  effort.  The  metric  LOC 
was  assumed  to  be  die  principal  determiner  of  devel¬ 
opment  effort  A  relationship  of  the  form 

£  =  aLb 

was  assumed,  where  L  is  the  number  of  lines  of 
code,  in  thousands,  and  £  is  the  total  effort  required, 
in  person-months.  Regression  analysis  was  used  to 
find  appropriate  values  of  parameters  a  and  b.  The 
resulting  equation  was 

£  =  5.2  L  091  . 

Nominal  programming  productivity,  in  LOC  per 
person-month,  can  then  be  calculated  as  L!E.  In 
order  to  account  for  deviations  from  the  derived 
form  for  £,  Walston  and  Felix  also  tried  to  develop  a 
productivity  index,  /,  which  would  increase  or 
decrease  the  productivity,  depending  upon  the  nature 
of  the  projecL  The  computation  of  /  was  to  be  based 
upon  evaluations  of  29  project  variables  (culled 
from  an  original  list  of  68  possible  determiners  of  I) 
[Walston77]. 

4.  Theory-Based  Models 

Few  of  the  proposed  models  have  substantial  theo¬ 
retical  bases.  Two  examples  that  do  are  presented 
below. 

a.  Rayleigh  Model 

L.  H.  Putnam  developed  a  model  of  the  software 
development  process  based  upon  the  assumption 
that  the  personnel  utilization  during  program  de¬ 
velopment  is  described  by  a  Rayleigh-type  curve 
such  as  the  following: 


/fie"'2'27"2 


where  y  =  the  number  of  persons  on  the  project 
at  any  time,  r; 

K  =  the  area  under  the  Rayleigh  curve, 
equal  to  the  total  life  cycle  effort  in 
person-years;"  and 

T  =  development  time  (time  of  peak 
staffing  requirement). 

Putnam  assumed  that  either  the  overall  staffing 
curve  or  the  staffing  curves  for  individual  phases 
of  the  development  cycle  can  be  modeled  by  an 
equation  of  this  form.  He  then  developed  the  fol¬ 
lowing  relationship  between  the  size  of  the  soft¬ 
ware  product  and  the  development  time  [Put- 
nam78,  PutnamSO]: 

S  =  CK  1/37’4/3  , 

where  S  =  the  number  of  source  LOC  delivered; 
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K  =  the  life-cycle  effort  in  person-years; 
and 

C  =  a  state-of-technology  constant. 

b.  Software  Science  Model — Halstead 

The  software  science  equations  can  be  used  as  a 
simple  theoretical  model  of  the  software  devel¬ 
opment  process.  The  effort  required  to  develop 
the  software  is  given  by  the  equation  E=V/L, 
which  can  be  approximated  by: 

nln2[nl  log2  n[+n2  log2  n2\  log2  n 
^ - 

The  units  of  E  are  elementary  mental  discrim¬ 
inations.  The  corresponding  programming  time 
(in  seconds)  is  simply  derived  from  E  by  dividing 
by  the  Stroud  number,  S: 

T  =  E/S  . 

The  value  of  S  is  usually  taken  as  18  for  these 
calculations.  If  only  the  value  of  length,  N,  is 
known,  then  the  following  approximation  can  be 
used  for  computing  T: 

N2log2n 


where  n  can  be  obtained  from  the  relationship 
fV  =  /tlog2(n/2)  [Halstead77,  Woodfield81], 

5.  Composite  Models 

As  experience  has  been  gained  with  previous 
models,  a  number  of  more  recent  models  have  util¬ 
ized  some  combination  of  intuition,  statistical  anal¬ 
yses,  and  expert  judgment.  These  have  been  labeled 
“composite  models”  by  Conte,  et  al.  Several  models 
are  listed  below  [Conte86], 

a.  COCOMO — Boehm 

This  is  probably  the  best  known  and  most 
thoroughly  documented  of  all  software  cost  es¬ 
timating  models.  It  provides  three  levels  of 
models:  basic,  intermediate,  and  detailed.  Boehm 
identifies  three  modes  of  product  development — 
organic,  semidetached,  and  embedded — that  aid 
in  determining  the  difficulty  of  the  project  The 
developmental  effort  equations  are  all  of  the 
form: 

E  =  aSbm  , 

where  a  and  b  are  constants  determined  for  each 
mode  and  model  level; 

S  is  the  value  of  source  LOC;  and 
m  is  a  composite  multiplier,  determined 
from  15  cost-driver  attributes. 

Boehm  suggests  that  the  detailed  model  will  pro¬ 
vide  cost  estimates  that  are  within  20%  of  actual 
values  70%  of  the  time,  or  PRED(0.20)  =  0.70 
(Boehm81,  Boehm84], 


b.  SOFTCOST— Tausworthe 

Tausworthe,  of  the  Jet  Propulsion  Laboratory,  at¬ 
tempted  to  develop  a  software  cost  estimation 
model  using  the  best  features  of  other  relatively 
successful  models  available  at  the  time.  His 
model  incorporates  the  quality  factors  from 
Walson-Felix  and  the  Rayleigh  model  of  Putnam, 
among  other  features.  It  requires  a  total  of  68 
input  parameters,  whose  values  are  deduced  from 
the  user’s  response  to  some  47  questions  about 
the  project.  Latest  reports  suggest  that  this  model 
has  not  been  tested  or  calibrated  adequately  to  be 
of  general  interest  [Tausworthe81  Conte86], 

c.  SPQR  Model — Jones 

T.  Capers  Jones  has  developed  a  software  cost 
estimation  model  called  the  Software  Produc¬ 
tivity,  Quality,  and  Reliability  (SPQR)  model. 
The  basic  approach  is  similar  to  that  of  Boehm’s 
COCOMO  model.  It  is  based  on  20  reasonably 
well-defined  and  25  not-so-well-defined  factors 
that  influence  software  costs  and  productivity. 
SPQR  is  a  commercial  product,  but  it  is  not  as 
thoroughly  documented  as  some  other  models. 
The  computer  model  requires  user  responses  to 
more  than  100  questions  about  the  project  in  or¬ 
der  to  formulate  the  input  parameters  needed  to 
compute  development  costs  and  schedules.  Jones 
claims  that  it  is  possible  for  a  model  such  as 
SPQR  to  provide  cost  estimations  that  will  come 
within  15%  of  actual  values  90%  of  the  time,  or 
PRED(0.15)  =  0.90  [Jones86], 

d.  COPMO — Thebaut 

Thebaut  proposed  a  software  development  model 
that  attempts  to  account  specifically  for  the  addi¬ 
tional  effort  required  when  teams  of  programmers 
are  involved  on  large  projects.  Thus,  the  model  is 
not  appropriate  for  small  projects.  The  general 
form  of  the  equation  for  the  effort,  £,  is  assumed 
to  be: 

E  =  a  +  bS  +  cPd  , 

where  a,  b,  c,  and  d  are  constants  to  be  deter¬ 
mined  from  empirical  data  via  re¬ 
gression  analysis; 

S  is  the  program  size,  in  thousands  of 
LOC;  and 

P  is  the  average  personnel  level  over  the 
life  of  the  project. 

Unfortunately,  this  model  requires  not  one  but 
two  input  parameters  whose  actual  values  are  not 
known  until  the  project  has  been  completed.  Fur¬ 
thermore,  the  constants  b  and  c  are  dependent 
upon  the  complexity  class  of  the  software,  which 
is  not  easily  determined.  This  model  presents  an 
interesting  form,  but  it  needs  further  development 
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and  calibration  to  be  of  widespread  interest.  In 
view  of  its  stage  of  development,  no  estimates  of 
its  predictive  ability  are  in  order  [Conte86, 
Thebaut84], 

e.  ESTLMACS— Rubin 

Rubin  has  developed  a  proprietary  software  es¬ 
timating  model  that  utilizes  gross  business  speci¬ 
fications  for  its  calculations.  The  model  provides 
estimates  of  total  development  effort,  staff  re¬ 
quirements,  cost,  risk  involved,  and  portfolio  ef¬ 
fects.  At  present,  the  model  addresses  only  the 
development  portion  of  the  software  life  cycle, 
ignoring  the  maintenance  or  post-deployment 
phase.  The  ESTIMACS  model  addresses  three 
important  aspects  of  software  management — es¬ 
timation,  planning,  and  control. 

The  ESTIMACS  system  includes  the  following 
modules: 

•  System  development  effort  estimator. 

This  module  requires  responses  to  25 
questions  regarding  the  system  to  be  de¬ 
veloped,  development  environment,  etc. 

It  uses  a  database  of  previous  project 
data  to  calculate  an  estimate  of  the  de¬ 
velopment  effort. 

•  Staffing  and  cost  estimator.  Inputs  re¬ 
quired  are:  the  effort  estimation  from 
above,  data  on  employee  productivity, 
and  salary  for  each  skill  level.  Again,  a 
database  of  project  information  is  used 
to  compute  the  estimate  of  project  dura¬ 
tion,  cost,  and  staffing  required. 

•  Hardware  configuration  estimator. 

Inputs  required  are:  information  on  the 
operating  environment  for  the  software 
product,  total  expected  transaction  vol¬ 
ume,  generic  application  type,  etc.  Out¬ 
put  is  an  estimate  of  the  required  hard¬ 
ware  configuration. 

•  Risk  estimator.  This  module  calculates 
risk  using  answers  to  some  60  questions 
on  project  size,  structure,  and  technol¬ 
ogy.  Some  of  the  answers  are  computed 
automatically  from  other  information  al¬ 
ready  available. 

•  Portfolio  analyzer.  This  module  pro¬ 
vides  information  on  the  effect  of  this 
project  on  the  total  operations  of  the  de¬ 
velopment  organization.  It  provides  the 
user  with  some  understanding  of  the  to¬ 
tal  resource  demands  of  the  projects. 

The  ESTIMACS  system  has  been  in  use  for  only 
a  short  time.  In  the  future,  Rubin  plans  to  extend 
the  model  to  include  the  maintenance  phase  of  the 
software  life  cycle.  He  claims  that  estimates  of 


the  total  effort  are  within  15%  of  actual  values 
[Rubin83].  The  ESTIMACS  model  is  compared 
with  the  GECOMO,  JS-2,  PCOC,  SLIM,  and 
SPQR/10  models  in  [Rubin83]  and  [Rubin87j. 

6.  Reliability  Models 

A  number  of  dynamic  models  of  software  defects 
have  been  developed.  These  models  attempt  to  de¬ 
scribe  the  occurrence  of  defects  as  a  function  of 
time,  allowing  one  to  define  the  reliability,  R,  and 
mean  time  to  failure,  MTTF.  One  example  is  the 
model  described  by  Musa,  which,  like  most  others  of 
this  type,  makes  four  basic  assumptions: 

•  Test  inputs  are  random  samples  from  the 
input  environment 

•  All  software  failures  are  observed. 

•  Failure  intervals  are  independent  of  each 
other. 

•  Times  between  failures  are  exponentially 
distributed. 

Based  upon  these  assumptions,  the  following  rela¬ 
tionships  can  be  derived: 

d(t)  =  D(\-e~bct)  , 

where  D  is  the  total  number  of  defects; 

b ,  c  are  constants  that  must  be  determined 
from  historical  data  for  similar  soft¬ 
ware; 

d(t)  is  the  number  (cumulative  total)  of 
defects  discovered  at  time  t : 

.  bcl 

MTTF(f)  =  —  . 

As  in  many  other  software  models,  the  determina¬ 
tion  of  b,  c  and  D  is  a  nontrivial  task,  and  yet  a 
vitally  important  one  for  th^  success  of  the  model 
[Ruston79,  Musa75,  Musa80,  Musa87], 

IV.  Implementation  of  a  Metrics  Program 

There  is  growing  evidence,  both  from  university  re¬ 
search  and  from  industry  experience,  that  the  conscien¬ 
tious  application  of  software  metrics  can  significantly 
improve  our  understanding  and  management  of  the 
software  development  process.  For  example,  a  number 
of  software  estimating  models  have  been  developed  to 
aid  in  the  estimation,  planning,  and  control  of  software 
projects.  Generally,  these  models  have  been  developed 
by  calibrating  the  estimating  formulas  to  some  existing 
database  of  previous  software  project  information.  For 
new  projects  that  are  not  significantly  different  from 
those  in  the  database,  reasonably  accurate  predictions 
(say,  ±20%)  are  often  possible  [Boehm81,  Jones86]. 
However,  numerous  studies  have  shown  that  these 
models  cannot  provide  good  estimates  for  projects  that 
may  involve  Afferent  environments,  languages,  or 
methodologies  [Kemerer87,  Rubin87],  Thus,  great  care 
must  be  taken  in  selecting  a  model  and  recalibrating  it, 
if  necessary,  for  the  new  application  environment. 
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The  selective  application  of  software  metrics  to  spe¬ 
cific  phases  of  the  software  development  cycle  can  also 
be  productive.  For  example,  certain  complexity  met¬ 
rics  have  been  shown  to  be  useful  in  guiding  software 
design  or  redesign  (maintenance)  activities  [KafuraS7, 
Rombach87,  Yau85], 

Encouraging  reports  on  the  use  of  metrics  are  coming 
from  industry  also.  A  recent  example  is  that  of  Grady 
and  Caswell  on  the  experience  of  Hewlett-Packard 
[Grady87],  They  describe  HP’s  experience  implement¬ 
ing  a  corporate-wide  software  metrics  program  de¬ 
signed  to  improve  project  management,  productivity, 
and  product  quality.  The  program  described  appears  to 
be  helping  to  achieve  these  goals,  both  in  the  short  run 
(on  individual  projects)  and  in  the  long  run  (with  im¬ 
proved  productivity  on  future  projects).  This  program 
may  serve  as  a  model  for  other  organizations  interested 
in  improving  their  software  development  results.  The 
HP  experience  in  establishing  an  organizational  soft¬ 
ware  metrics  program  provides  a  number  of  useful  in¬ 
sights,  including  the  following: 

•  In  addition  to  planning  carefully  for  the  tech¬ 
nical  operation  of  the  metrics  program,  the 
idea  of  such  a  program  must  be  “sold”  to  all 
individuals  involved,  from  top  management, 
who  must  find  the  resources  to  support  it,  to 
entry  level  programmers,  who  may  feel 
threatened  by  it. 

•  Although  some  short-range  benefits  may  be 
realized  on  current  projects,  the  organization 
should  expect  to  collect  data  for  at  least  three 
years  before  the  data  are  adequate  for  meas¬ 
uring  long-term  trends. 

Outlined  below  is  a  general  procedure  for  implement¬ 
ing  an  organizational  software  metrics  program.  The 
details  of  implementing  such  a  program  will,  of  course, 
vary  significantly  with  the  size  and  nature  of  the  organ¬ 
ization.  However,  all  of  the  steps  outlined  are  neces¬ 
sary,  in  one  form  or  another,  to  achieve  a  successful 
implementation  of  a  metrics  program. 

The  implementation  plan  that  follows  is  presented  as  a 
sequence  of  distinct  steps.  In  practice,  the  application 
of  this  plan  will  probably  involve  some  iteration  be¬ 
tween  steps,  just  as  occurs  with  the  application  of  spe¬ 
cific  software  development  life  cycle  models.  Al¬ 
though  it  is  not  stated  explicitly,  those  responsible  for 
establishing  the  metrics  program  must  be  concerned  at 
each  step  with  communicating  the  potential  benefits  of 
the  program  to  all  members  of  the  organization  and 
selling  the  organization  on  the  merits  of  such  a  pro¬ 
gram.  Unless  the  organization  as  a  whole  understands 
and  enthusiastically  supports  the  idea,  the  program  will 
probably  not  achieve  the  desired  results. 

1.  Planning  Process 

The  implementation  of  a  metrics  program  requires 

careful  planning. 


a.  Defining  Objectives 

What  are  the  objectives  of  the  proposed  program? 
What  is  it  supposed  to  achieve,  and  how? 

A  specific  approach  which  can  be  used  to  plan  the 
software  metrics  program  is  the  Goal/Question/ 
Metric  (GQM)  paradigm  developed  by  Basili,  et 
al.  [Basili84,  Basili87],  This  paradigm  consists  of 
identifying  the  goals  to  be  achieved  by  the 
metrics  program  and  associating  a  set  of  related 
questions  with  each  goal.  The  answers  to  these 
questions  should  make  it  possible  to  identify  the 
quantitative  measures  that  are  necessary  to  pro¬ 
vide  the  answers  and,  thus,  to  reach  the  goals. 

b.  Initial  Estimates  of  Effort  and  Cost 

Metrics  programs  are  not  free;  they  may  require 
major  commitments  of  resources.  For  this  reason, 
it  is  especially  important  to  sell  the  idea  of  such  a 
program  to  top  management  Estimates  of  pro¬ 
gram  costs  should  be  made  early  in  the  planning 
process,  even  though  they  may  be  very  crude,  to 
help  the  organization  avoid  major  surprises  later 
on.  These  effort/cost  estimates  will  need  contin¬ 
uous  refinement  as  the  project  proceeds. 

(i)  Initial  Implementation 

What  are  the  costs  associated  with  the  initial 
start-up  of  the  program? 

(ii)  Continuing  Costs 

Organizations  must  expect  to  incur  continuing 
costs  to  operate  the  metrics  program.  These 
include,  for  example,  the  costs  of  collecting 
and  analyzing  data  and  of  maintaining  the 
metrics  database. 

2.  Selection  of  Model  and  Metrics 

A  specific  model  and  set  of  metrics  is  selected, 
based  upon  the  objectives  defined  and  cost  con¬ 
siderations  identified.  Given  a  choice  of  several 
models  that  seem  capable  of  meeting  the  objectives 
and  cost  requirements,  the  simplest  model  that  is  not 
intuitively  objectionable  should  be  chosen.  The 
GQM  paradigm  provides  a  practical  procedure  for 
the  selection  of  software  metrics  [Basili84,  Basili87], 
Important  considerations  in  this  selection  process  are 
the  following  items. 

a.  Projected  Ability  to  Meet  Objectives 

Metrics  and  models  available  should  be  compared 
with  respect  to  their  apparent  ability  to  meet  the 
objectives  (goals)  identified. 

b.  Estimated  Data  Requirements  and  Cost 

Models  identified  as  capable  of  meeting  the  ob¬ 
jectives  of  the  organization  should  be  compared 
in  terms  of  data  requirements  and  associated  cost 
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b.  Responsible  Personnel 

Specific  obligations  of  personnel  who  gather, 
maintain,  or  analyze  the  data  should  be  made  very 
clear.  It  is  impossible  to  collect  some  types  of 
data  after  the  project  has  been  completed. 

5.  Continuing  Use  and  Refinement 

For  the  metrics  program  to  be  successful,  it  must  be 
continuously  applied  and  the  results  reviewed  peri¬ 
odically.  The  following  steps  are  involved: 

a.  Evaluating  Results 

Results  should  be  carefully  summarized  and  com¬ 
pared  with  what  actually  happened.  This  is  often 
not  done  because  “there  wasn’t  enough  time.” 

b.  Adjusting  the  Model 

Most  models  in  use  require  a  calibration  process, 
adapting  the  values  of  multiplicative  constants, 
etc.,  to  the  empirical  data  for  the  environment  of 
application.  Based  upon  the  results  achieved, 
these  calibration  constants  should  be  reviewed 
and  adjusted,  if  appropriate,  especially  over  a 
long  period  of  use,  during  which  the  environment 
itself  may  change  significantly. 

V.  Trends  in  Software  Metrics 


of  implementation.  As  indicated  above,  par¬ 
simonious  models  are  to  be  preferred,  if  adequate. 

3.  Data  Requirements  nrd  Database  Maintenance 

Once  a  specific  model  has  been  chosen,  the  data 
requirements  and  cost  estimates  must  be  spelled  out 
in  detail  and  refined.  At  this  point,  care  must  be 
taken  to  collect  enough,  but  not  too  much  data.  Of¬ 
ten,  the  initial  reaction  is  to  to  collect  masses  of  data 
without  regard  to  how  it  will  be  used,  “just  in  case  it 
might  be  useful.”  The  result  is  usually  extinction  by 
drowning  in  data.  For  this  part  of  the  task,  the  work 
of  Basili  and  Weiss  on  collecting  data  for  software 
engineering  projects  may  be  especially  helpful 
[Basili84],  Steps  include  the  following  considera¬ 
tions: 

a.  Specific  Data  Required 

Data  must  be  gathered  throughout  the  software 
life  cycle.  The  specific  information  required  at 
each  phase  must  be  identified. 

b.  Data  Gathering  Procedures 

Once  the  necessary  data  have  been  identified,  the 
specific  methods  and  procedures  for  gathering  the 
data  must  be  described,  and  responsible  personnel 
identified. 

c.  Database  Maintenance 

The  database  of  metric  data  becomes  an  important 
corporate  resource.  Funds,  procedures,  and  re¬ 
sponsibilities  for  its  maintenance  must  be  spelled 
out. 

d.  Refined  Estimates  of  Efforts  and  Costs 

The  information  generated  in  the  preceding  steps 
should  now  make  it  possible  to  compute  fairly 
accurate  estimates  of  the  effort  and  costs  involved 
in  implementing  and  continuing  the  software 
metrics  program. 

4.  Initial  Implementation  and  Use  of  the  Model 

Assuming  that  the  above  steps  have  been  carried  out 
successfully  and  that  the  estimated  costs  are  accept¬ 
able,  the  program  can  now  be  initiated.  The  follow¬ 
ing  items  should  be  re-emphasized  at  this  time: 

a.  Clarification  of  Use 

The  intended  use  of  the  metrics  program  should 
have  been  made  clear  early  on.  However,  it  is 
appropriate  to  restate  this  clearly  when  the  pro¬ 
gram  is  initiated.  Are  the  metrics  to  be  used  only 
for  project  management  purposes?  What  about 
their  use  as  tools  for  evaluating  personnel? 


Current  trends  in  the  software  metrics  area  are  encour¬ 
aging.  Metrics  are  being  applied  more  widely,  with 
good  results  in  many  cases.  The  limitations  of  existing 
models  have  been  recognized,  and  people  are  becom¬ 
ing  more  realistic  in  their  expectations  of  what  these 
models  can  provide.  There  is  a  growing  awareness  that 
metrics  programs  pay  off,  but  not  without  some  invest¬ 
ment  of  both  time  and  resources.  As  the  benefits  of 
software  metrics  programs  become  more  evident,  the 
establishment  of  such  a  program  will  become  essential 
for  software  development  organizations  to  remain  com¬ 
petitive  in  this  area. 

As  our  experience  with  metrics  grows,  better  data  will 
become  available  for  further  research.  This,  in  turn, 
will  make  it  possible  to  develop  better  metrics  and 
models.  Although  it  is  generally  still  too  costly  to  run 
carefully  controlled  experiments  on  large-scale  soft¬ 
ware  projects,  better  experimental  data  are  becoming 
available,  and  for  larger  projects  than  in  the  past.  Such 
data  should  provide  better  insight  into  the  problems  of 
large  software  efforts.  Results  already  available  have 
improved  our  understandirg  ef  the  metrics  currently  in 
use  and  have  provided  insight  into  how  to  select  better 
metrics. 

Finally,  although  there  are  still  a  large  number  of 
metrics  in  use  or  under  active  investigation,  a  smaller 
set  of  metrics  is  emerging  as  having  more  practical 
utility  in  the  measurement  of  the  software  development 
process.  An  economical  set  of  metrics  capturing  the 
essential  characteristics  of  software  may  yet  emerge 
from  this  smaller,  more  useful  set. 
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Software  engineering  is  still  a  very  young  discipline. 
There  are  encouraging  signs  that  we  are  beginning  to 
understand  some  of  the  basic  parameters  that  are  most 
influential  in  the  processes  of  software  production. 
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Teaching  Considerations 


General  Comments 


In  the  past,  the  software  metrics  area  has  been 
characterized  by  a  multitude  of  candidate  metrics, 
surrounded  by  sometimes  exaggerated  and  often 
conflicting  claims.  As  a  result,  many  people,  espe¬ 
cially  practicing  software  professionals,  have  formed 
strong  opinions  about  the  validity  or  practicality  of 
software  metrics.  Anyone  intending  to  teach  a 
course  in  this  area  should  be  aware  of  this  controver¬ 
sial  atmosphere.  Depending  upon  the  students  in¬ 
volved,  the  instructor  will  have  to  take  special  care 
to  present  the  material  as  objectively  as  possible, 
pointing  out  shortcomings  where  appropriate,  but 
still  trying  to  emphasize  the  positive  potential  of  the 
field. 


Textbooks 


Although  there  is  a  fairly  extensive  literature  on  soft¬ 
ware  metrics,  textbooks  are  only  now  beginning  to 
appear.  The  only  one  available  as  of  fall  1988  that 
even  begins  to  cover  all  of  the  topics  in  this  module 
adequately  is  that  by  Conte,  Dunsmore,  and  Shen 
[Conte86].  One  may  expect  that  the  appearance  of 
this  text  and  the  continuing  interest  and  research  on 
metrics  will  result  in  a  number  of  new  texts  in  this 
area  in  the  next  few  years.  Thus,  anyone  teaching  a 
course  in  metrics  should  first  consult  with  publishers 
for  their  most  recent  offerings. 

If  the  instructor  would  like  to  place  more  emphasis 
on  the  implementation  of  software  metrics  programs, 
the  recent  book  by  Grady  and  Caswell,  relating  the 
experience  of  the  Hewlett-Packard,  might  be  consid¬ 
ered  as  a  supplementary  text  [Grady87], 


Possible  Courses 


The  material  presented  in  this  module  may  be  used 
in  various  ways  to  meet  the  needs  of  different 
audiences.  Depending  upon  the  total  time  to  be 
devoted  to  the  course  and  upon  student  backgrounds, 
software  metrics  might  be  taught  in  a  graduate 
course  of  2  or  3  quarter-  (or  semester-)  hours  or  in 
short,  intensive  tutorials  (possibly  non-credit)  lasting 


from  a  few  hours  to  a  few  days.  Also,  a  unit  on 
software  metrics  might  be  incorporated  into  a 
broader  software  engineering  course.  The  objectives 
and  prerequisites  listed  earlier  could  apply  to  a  2-  or 
3-hour  credit  course.  Clearly,  these  objectives  are 
not  likely  to  be  achieved  in  an  intensive,  shorter 
course.  Below  are  described  two  possible  courses 
based  on  the  material  contained  in  this  curriculum 
module. 

Graduate-Level  University  Course,  2  to  3  Quar¬ 
ter  Hours 

A  graduate-level  university  course  on  software 
metrics  could  be  based  on  this  module.  A  lecture- 
based  2-quarter- hour  course  or  a  similar  course  aug¬ 
mented  with  a  significant  project  and  carrying  3 
hours  credit  could  cover  the  material  presented  here. 
It  is  appropriate  to  ask  students  to  read  some  of  the 
significant  research  papers  on  software  metrics  in 
such  a  course. 

Coverage.  For  the  2-quarter-hour  course,  class  time 
(20  to  25  hours)  might  be  allocated  as  follows: 

I.  Introduction  (1-2  hours) 

II.  Product  Metrics  (8-10  hours) 

III.  Process  Metrics,  Models,  and  Empirical 
Validation  (8-10  hours) 

IV.  Implementation  of  a  Metrics  Program  (2 
hours) 

V.  Trends  in  Software  Metrics  (1  hour) 

Objectives.  For  a  2-quarter-hour  course,  the  first 
five  cognitive  domain  objectives  should  be  achiev¬ 
able.  For  a  3-quarter-hour  course,  all  six  objectives 
should  be  targeted.  Whether  or  not  these  objectives 
can  actually  be  achieved  is  largely  a  function  of  stu¬ 
dent  background. 

Prerequisites.  For  a  2-  or  3-quarter-hour  course, 
students  should  have  all  of  the  background  listed  un¬ 
der  Prerequisite  Knowledge.  Students  less  well  pre¬ 
pared  will  have  difficulty  in  achieving  all  course  ob¬ 
jectives.  Although  not  specifically  noted  under  Pre¬ 
requisite  Knowledge,  it  is  assumed  that  students 
have  a  solid  mathematics  background,  at  least 
through  differential  and  integral  calculus. 

Intensive  4-Hour  to  6-Hour  Tutorial 

For  non-university  audiences,  an  intensive  course  of 
4  to  6  hours  could  be  based  on  this  material.  It 
might  be  a  non-credit  offering  aimed  at  some  spe¬ 
cific  audience,  such  as  software  project  managers. 
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Coverage.  The  coverage  should  concentrate  on 
topics  most  appropriate  for  the  particular  audience. 
For  example,  project  managers  can  be  assumed  to  be 
more  interested  in  the  implementation  of  a  metrics 
program  than  in  the  details  of  complexity  metrics. 

Objectives.  For  a  course  of  this  type,  appropriate 
objectives  might  be  only  objectives  1  and  2  in  the 
cognitive  domain.  If  students  have  good  technical 
backgrounds,  objective  3  might  also  be  appropriate. 

Prerequisites.  The  background  required  can  be 
reduced  from  that  required  of  a  student  in  a  normal 
university  course.  For  example,  the  statistics  back¬ 
ground  might  well  be  waived.  Of  course,  the  objec¬ 
tives  that  can  be  achieved  depend  heavily  upon  the 
background  of  the  students. 


Resources/Support  Materials _ 

Instructors  should  seek  software  tools  for  studying 
and  computing  software  metrics.  What  is  available 
will  depend  upon  local  circumstances.  Many  univer¬ 
sities  have  software  available  for  computing  the 
simpler  metrics,  such  as  LOC,  v(G),  and  the 
Halstead  metrics.  However,  these  facilities  may  be 
difficult  to  use  or  not  available  in  the  most  desirable 
computing  environment.  Thus,  instructors  will  have 
to  search  out  the  best  tools  for  their  particular  situa¬ 
tions. 

In  addition,  it  is  highly  desirable  that  some  comput¬ 
erized  software  metrics  model  be  available  for  stu¬ 
dent  experimentation.  It  may  be  possible  to  acquire 
commercially  available  cost-estimating  tools  for  use 
in  a  class  environment  for  little  or  no  cost. 

There  are  other  resources  that  may  also  be  used  in 
presenting  and  discussing  this  material.  For  ex¬ 
ample,  personnel  from  local  industry  who  are  most 
knowledgeable  in  the  use  and  application  of  soft¬ 
ware  metrics  in  their  organization  can  be  asked  to 
provide  assistance  in  preparing  lectures  or  even  to 
deliver  guest  lectures.  Depending  upon  their  cir¬ 
cumstances  and  background,  students  may  be  asked 
to  report  on  or  make  recommendations  regarding  the 
use  of  software  metrics  in  their  work  environments. 


Exercises 


For  a  credit  course,  it  is  assumed  that  students  will 
be  assigned  homework  problems  related  to  the 
metrics  and  models  discussed  in  class.  These  ex¬ 
ercises  should  include  the  use  of  automated  tools  to 
compute  metrics  for  some  representative  examples 
of  software,  if  such  tools  are  available.  However,  it 
is  essential  that  students  do  some  manual  calcula¬ 
tions  of  metrics  such  as  LOC,  v(G),  and  Halstead’s 
metrics.  By  doing  so,  they  will  acquire  a  much  bet¬ 
ter  understanding  of  certain  fundamental  problems, 
for  example,  the  difficulty  in  defining  LOC  or  the 
counting  rules  for  Halstead’s  metrics. 

Depending  upon  their  backgrounds  and  the  time 
available,  students  might  also  be  asked  to  do  a  proj¬ 
ect  that  implements  or  modifies  a  metrics  compu¬ 
tation  or  process  model.  Students  might  also  be 
asked  to  work  as  a  team  to  design  a  complete  metrics 
program  for  implementation  in  some  particular  soft¬ 
ware  development  environment. 
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Abstract:  One  of  the  most  important  problems 
faced  by  software  developers  and  users  is  the 
prediction  of  the  size  of  a  programming  system  and 
its  development  effort.  As  an  alternative  to  “size," 
one  might  deal  with  a  measure  of  the  "function” 
that  the  software  is  to  perform.  Albrecht  [Ij  has 
developed  a  methodology  to  estimate  the  amount  of 
the  “function"  the  software  is  to  perform,  in  terms 
of  the  data  it  is  to  use  ( absorb )  and  to  generate 
(produce).  The  "function"  is  quantified  as  "func¬ 
tion  points,”  essentially,  a  weighted  sum  of  the 
numbers  of  "inputs,"  "outputs,"  "master  files," 
"inquiries”  provided  to,  or  generated  by,  the  soft¬ 
ware.  This  paper  demonstrates  the  equivalence  be¬ 
tween  Albrecht's  external  input! output  data  flow 
representative  of  a  program  (the  "function  points" 
metric)  and  Halstead's  [2]  "software  science"  or 
"software  linguistics"  model  of  a  program  as  well 
as  the  "soft  content "  variation  of  Halstead' s  model 
suggested  by  Gaffney  [7], 

Further,  the  high  degree  of  correlation  between 
"function  points”  and  the  eventual  "SLOC" 
(source  lines  of  code)  of  the  program,  and  between 
"function  points"  and  the  work-effort  required  to 
develop  the  code,  is  demonstrated.  The  "function 
point”  measure  is  thought  to  be  more  useful  than 
"SLOC”  as  a  prediction  of  work  effort  because 
"function  points"  are  relatively  easily  estimated 
from  a  statement  of  basic  requirements  for  a  pro¬ 
gram  early  in  the  development  cycle. 

The  strong  degree  of  equivalency  between  "function 
points"  and  “SLOC”  shown  in  the  paper  suggests  a 
two-step  work-effort  validation  procedure,  first 
using  "function  points"  to  estimate  "SLOC"  and 
then  using  "SLOC"  to  estimate  the  work-effort. 
This  approach  would  provide  validation  of  appli¬ 
cation  development  work  plans  and  work-effort  es¬ 
timates  early  in  the  development  cycle.  The  ap¬ 
proach  would  also  more  effectively  use  the  existing 
base  of  knowledge  on  producing  "SLOC"  until  a 
similar  base  is  developed  for  "function  points.” 

The  paper  assumes  that  the  reader  is  familiar  with 
the  fundamental  theory  of  "software  science” 
measurements  and  the  practice  of  validating  es¬ 
timates  of  work-effort  to  design  and  implement  soft¬ 
ware  applications  (programs).  If  not,  a  review  of 
{ 1 1 -[3 J  is  suggested. 


This  paper  presents  a  comparison  of  SLOC  and 
function  points  as  predictors  of  software  develop¬ 
ment  effort,  using  Halstead’s  software  science  as 
theoretical  support  for  the  use  of  function  points. 
One  useful  feature  of  this  paper  is  an  appendix 
(more  than  three  pages)  that  provides  a  detailed  ex¬ 
planation  of  how  to  apply  the  function  point  meth¬ 
odology. 

Arthur85 

Arthur,  L.  J.  Measuring  Programmer  Productivity 
and  Software  Quality.  New  York:  John  Wiley, 
1985. 
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Appendix  A.  ALC  Reserved  Words 

Appendix  B.  COBOL  Reserved  Words 

Appendix  C.  PL! I  Reserved  Words 

The  author  presents  a  set  of  eleven  software  quality 
metrics,  including  correctness,  efficiency,  maintain¬ 
ability,  and  reliability.  These  eleven  metrics  are 
then  described  as  functions  of  a  more  basic  set  of 
some  22  different  software  quality  criteria.  The  au¬ 
thor  then  discusses  these  metrics  in  some  detail, 
with  specific  applications  to  various  programming 
languages.  This  book  may  be  of  more  interest  to 
practitioners  than  to  serious  students  of  software 
metrics.  There  appears  to  be  little  new  material, 
and  the  presentation  is  somewhat  redundant. 
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Basili80 

Basili,  V.  R.  Tutorial  on  Models  and  Metrics  for 
Software  Management  and  Engineering.  New  York: 
IEEE  Computer  Society  Press,  1980. 

This  is  a  tutorial  on  quantitative  methods  of  soft¬ 
ware  management  and  engineering.  A  quantitative 
methodology  is  needed  to  evaluate,  control,  and 
predict  software  development  and  maintenance 
costs.  This  quantitative  approach  allows  cost,  time, 
and  quality  tradeoffs  to  be  made  in  a  systematic 
manner.  The  tutorial  focuses  on  numerical  product- 
oriented  measures  such  as  size,  complexity,  and 
reliability  and  on  resource-oriented  measures  such 
as  cost,  schedules,  and  resources.  Twenty  articles 
from  software  engineering  literature  are  reprinted  in 
this  document.  The  articles  are  organized  into  the 
following  sections:  resource  models,  changes  and 
errors,  product  metrics,  and  data  collection.  Suc¬ 
cessful  application  of  the  techniques,  however,  re¬ 
quires  a  thorough  knowledge  of  the  project  under 
development  and  any  assumptions  made.  Only  then 
can  these  techniques  augment  good  managerial  and 
engineering  judgement. 

Basili84 

Basili,  V.  R.  and  D.  M.  Weiss.  “A  Methodology  For 
Collecting  Valid  Software  Engineering  Data."  IEEE 
Trans  Software  Eng.  SE-IO,  6  (Nov.  1984), 
728-738. 

Abstract:  An  effective  data  collection  method  for 
evaluating  software  development  methodologies 
and  for  studying  the  software  development  process 
is  described  The  method  uses  goal-directed  data 
collection  to  evaluate  methodologies  with  respect  to 
the  claims  made  for  them.  Such  claims  are  used  as  a 
basis  for  defining  the  goals  of  the  data  analysis, 
defining  a  set  of  data  categorization  schemes,  and 
designing  a  data  collection  form. 

The  data  to  be  collected  are  based  on  the  changes 
made  to  the  software  during  development,  and  are 
obtained  when  the  changes  are  made.  To  ensure 
accuracy  of  the  data,  validation  is  performed  con¬ 
currently  with  software  development  and  data  col¬ 
lection.  Validation  is  based  on  interviews  with 
those  people  supplying  the  data.  Results  from  using 
the  methodology  show  that  data  validation  is  a  nec¬ 
essary  part  of  change  data  collection.  Without  it.  as 
much  as  50  percent  of  the  data  may  be  erroneous. 

Feasibility  of  the  data  collection  methodology  was 
demonstrated  by  applying  it  to  five  different  proj¬ 
ects  in  two  different  environments.  The  application 
showed  that  the  methodology  was  both  feasible  and 
useful. 

This  article  describes  an  effective  data  collection 
method  for  studying  the  software  development 
process  and  evaluating  software  development  meth¬ 


odologies.  The  paper  describes  the  Goal/Questions/ 
Metric  paradigm  for  data  collection. 

Baslli87 

Basili,  V.  R.  and  H.  D.  Rombach.  TAME:  Integrat¬ 
ing  Measurement  into  Software  Environments.  TR- 
1764,  University  of  Maryland,  Computer  Science 
Department,  1987. 

Abstract:  Based  upon  a  dozen  years  of  analyzing 
software  engineering  processes  and  products,  we 
propose  a  set  of  software  engineering  process  and 
measurement  principles.  These  principles  lead  to 
the  view  that  an  Integrated  Software  Engineering 
Environment  (ISEE)  should  support  multiple  proc¬ 
ess  models  across  the  full  software  life  cycle,  the 
technical  and  management  aspects  of  software  engi¬ 
neering,  and  the  planning,  construction,  and  feed¬ 
back  and  learning  activities.  These  activities  need 
to  be  tailored  to  the  specific  project  under  devel¬ 
opment  and  they  must  be  tractable  for  management 
control.  The  tailorability  and  tractability  attributes 
require  the  support  of  a  measurement  process.  The 
measurement  process  needs  to  be  top-down,  based 
upon  operationally  defined  goals.  The  TAME  proj¬ 
ect  uses  the  goal! question! metric  paradigm  to  sup¬ 
port  this  type  of  measurement  paradigm.  It  pro¬ 
vides  for  the  establishment  of  project  specific  goals 
and  corporate  goals  for  planning,  provides  for  the 
tracing  of  these  goals  throughout  the  software  life 
cycle  via  feedback  and  post  mortem  analysis,  and 
offers  a  mechanism  for  long  range  improvement  of 
all  aspects  of  software  development.  The  TAME 
system  automates  as  much  of  this  process  as  pos¬ 
sible,  by  supporting  goal  development  into  meas¬ 
urement  via  models  and  templates,  providing  evalu¬ 
ation  and  analysis  of  the  development  and  mainte¬ 
nance  processes,  and  creating  and  using  databases 
of  historical  data  and  knowledge  bases  that  incor¬ 
porate  experience  from  prior  projects. 

Ten  software  process  principles  and  fourteen  soft¬ 
ware  measurement  principles,  based  upon  a  dozen 
years  of  research  in  the  area,  are  presented.  The 
Goal/Quesuons/Metric  paradigm  for  designing  soft¬ 
ware  measurement  systems  is  also  discussed. 
TAME  stands  for  Tailoring  A  Measurement  Envi¬ 
ronment. 

Behrens83 

Behrens,  C.  A.  “Measuring  the  Productivity  of  Com¬ 
puter  Systems  Development  Activities  with  Function 
Points.”  IEEE  Trans.  Software  Eng.  SE-9,  6  (Nov. 
1983),  648-652. 

Abstract:  The  function  point  method  of  measuring 
application  development  productivity  developed  by 
Albrecht  is  reviewed  and  a  productivity  improve¬ 
ment  measure  introduced.  The  measurement  meth¬ 
odology  is  then  applied  to  24  development  projects. 
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Size,  environment,  and  language  effects  on  produc¬ 
tivity  are  examined.  The  concept  of  a  productivity 
index  which  removes  size  effects  is  defined  and  an 
analysis  of  the  statistical  significance  of  results  is 
presented. 

This  is  a  report  of  a  relatively  successful  attempt  to 
correlate  function  point  values  with  productivity 
and  effort  values  in  a  production  environment. 

Boehm76 

Boehm,  B.  W.,  J.  R.  Brown,  and  M.  Lipow. 
“Quantitative  Evaluation  of  Software  Quality.” 
Proc.  2nd  Inti.  Conf.  on  Software  Engineering. 
Long  Beach,  Calif.:  IEEE  Computer  Society,  Oct. 
1976,  592-605.  Reprinted  in  [8asili80],  218-231. 

Abstract:  The  study  reported  in  this  paper  estab¬ 
lishes  a  conceptual  framework  and  some  key  initial 
results  in  the  analysis  of  the  characteristics  of  soft¬ 
ware  quality. 

The  software  quality  characteristics  delineated  in 
this  article  are  also  discussed  in  [Perlis81],  where 
they  are  compared  to  those  of  McCall,  et  al. 

Boehm81 

Boehm,  B.  W.  Software  Engineering  Economics. 
Englewood  Giffs,  N.  J.:  Prentice-Hall,  1981. 

Table  of  Contents 

Pan  I.  Introduction:  Motivation  and  Context 

1  Case  Study  1:  Scientific  American  Subscription 

Processing 

2  Case  Study  2:  An  Urban  School  Attendance  System 

3  The  Goals  of  Software  Engineering 

Part  II.  The  Software  Life-Cycle:  A  Quantitative 
Model 

4  The  Software  Life-Cycle:  Phases  and  Activities 

5  The  Basic  COCOMO  Model 

6  The  Basic  COCOMO  Model:  Development  Modes 

7  The  Basic  COCOMO  Model:  Activity  Distribution 

8  The  Intermediate  COCOMO  Model:  Product  Level 

Estimates 

9  Intermediate  COCOMO:  Component  Level  Estima¬ 

tion 

Part  III.  Fundamentals  of  Software  Engineering  Eco¬ 
nomics 

Part  III  A.  Cost-Effectiveness  Analysis 

10  Performance  Models  and  Cost-Effectiveness 
Models 

1 1  Production  Functions:  Economies  of  Scale 

12  Choosing  Among  Alternatives:  Decision  Criteria 
Part  IIIB.  Multiple-Goal  Decision  Analysis 

13  Net  Value  and  Marginal  Analysis 

14  Present  versus  Future  Expenditure  and  Income 

15  Figures  of  Merit 

1 6  Goals  as  Constraints 

17  Systems  Analysis  and  Constrained  Optimization 

18  Coping  with  Unreconcilable  and  U nquantifiable 

Goals 


Part  IIIC.  Dealing  with  Uncertainties,  Risk,  And  The 
Value  Of  Information 

19  Coping  with  Uncertainties:  Risk  Analysis 

20  Statistical  Decision  Theory:  The  Value  of  Infor¬ 

mation 

Part  IV.  The  Art  of  Software  Cost  Estimation 
Part  TVA.  Software  Cost  Estimation  Methods  And 
Procedures 

21  Seven  Basic  Steps  in  Software  Cost  Estimation 

22  Alternative  Software  Cost  Estimation  Methods 
Part  IVB.  The  Detailed  COCOMO  Model 

23  Detailed  COCOMO:  Summary  and  Operational 
Description 

24  Detailed  COCOMO  Cost  Drivers:  Product  Attri¬ 

butes 

25  Detailed  COCOMO  Cost  Drivers:  Computer  At¬ 

tributes 

26  Detailed  COCOMO  Cost  Drivers:  Personnel  At¬ 

tributes 

27  Detailed  COCOMO  Cost  Drivers:  Project  Attri¬ 

butes 

28  Factors  Not  Included  in  COCOMO 

29  COCOMO  Evaluations 

Part  TVC.  Software  Cost  Estimation  and  Life-Cycle 
Management 

30  Software  Maintenance  Cost  Estimation 

31  Software  Life-Cycle  Cost  Estimation 

32  Software  Project  Planning  and  Control 

33  Improving  Software  Productivity 

This  is  a  classic  text  on  software  engineering  eco¬ 
nomics.  It  presents  an  excellent,  detailed  discussion 
of  the  use  of  selected  software  metrics  in  one  partic¬ 
ular  software  development  process  model,  i.e., 
COCOMO,  which  was  developed  by  the  author. 
Otherwise  it  is  not  appropriate  as  a  text  for  this 
module;  its  scope  is  much  too  limited,  and  the  book 
is  now  somewhat  out  of  date. 

Boehm84 

Boehm,  B.  W.  “Software  Engineering  Economics.” 
IEEE  Trans.  Software  Eng.  SE-10 ,  1  (Jan.  1984), 
4-21. 

Abstract:  This  paper  summarizes  the  current  state 
of  the  art  and  recent  trends  in  software  engineering 
economics.  It  provides  an  overview  of  economic 
analysis  techniques  and  their  applicability  to  soft¬ 
ware  engineering  and  management.  It  surveys  the 
field  of  software  cost  estimation,  including  the 
major  estimation  techniques  available,  the  state  of 
the  art  in  algorithmic  cost  models,  and  the  out¬ 
standing  research  issues  in  software  cost  estima¬ 
tion. 

The  cost  estimation  techniques  identified  are:  al¬ 
gorithmic  models,  expert  judgment,  analogy, 
Parkinson’s  principle,  price-to-win,  top-down,  and 
bottom-up.  Although  Parkinson’s  principle  and 
price-to-win  are  identified  as  unacceptable  methods, 
it  is  acknowledged  that,  of  the  other  methods,  none 
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is  demonstrably  superior.  Thus,  since  the  methods 
tend  to  complement  one  another,  best  results  will 
probably  come  from  using  some  combination  of  the 
other  techniques.  The  following  algorithmic  cost 
estimation  models  are  discussed:  Putnam’s  SLIM, 
The  Doty  Model,  RCA  PRICE  S,  COCOMO, 
Bailey-Basili,  Grumman  SOFTCOST,  Tausworthe 
Deep  Space  Network  (DSN)  model,  and  the  Jensen 
model.  Finally,  the  author  identifies  seven  major 
issues  needing  further  research — including  size  es¬ 
timation,  size  and  complexity  metrics,  cost-driver 
attributes,  cost  model  analysis  and  refinement, 
models  of  project  dynamics,  models  for  software 
evolution,  and  software  data  collection. 

Card87a 

Card,  D.  N.  and  W.  W.  Agresti.  “Resolving  the  Soft¬ 
ware  Science  Anomaly.”  J.  Syst.  and  Software  7,  1 
(March  1987),  29-35. 

Abstract:  The  theory  of  software  science  proposed 
by  Halstead  appears  to  provide  a  comprehensive 
model  of  the  program  construction  process.  Al¬ 
though  software  science  has  been  widely  criticized 
on  theoretical  grounds,  its  measures  continue  to  be 
used  because  of  apparently  strong  empirical  sup¬ 
port.  This  study  reexamined  one  basic  relationship 
proposed  by  the  theory:  that  between  estimated  and 
actual  program  length.  The  results  show  that  the 
apparent  agreement  between  these  quantities  is  a 
mathematic  artifact.  Analyses  of  both  Halsteads 
own  data  and  another  larger  data  set  confirm  this 
conclusion.  Software  science  has  neither  a  firm 
theoretical  nor  empirical  foundation. 

The  anomaly  referred  to  in  the  title  is  that  although 
a  high  correlation  between  the  actual  (observed) 
program  length  and  estimated  (calculated)  program 
length  appears  to  be  supported  by  empirical  studies, 
no  solid  theoretical  basis  has  been  established  for 
such  a  relationship.  The  authors  resolve  the 
anomaly  by  demonstrating  that  the  two  quantities 
are  defined  in  such  a  way  that  one  is  mathematically 
dependent  upon  the  other.  Thus,  the  strong  empiri¬ 
cal  support  previously  reported  apparently  has  not 
been  established  either. 

Card87b 

Card,  D.  N.  and  W.  W.  Agresti.  “Comments  on 
Resolving  the  Software  Science  Anomaly.  ’  J.  Syst. 
and  Software  7,  1  (March  1987),  83-84. 

Abstract:  Refer  to  the  abstract  for  [Card87a] 

The  authors  offer  a  rationale  for  [Card87a],  pointing 
out  that  users  of  software  analysis  tools  based  upon 
software  science  metrics  may  not  be  aware — but 
should  be — cf  the  lack  of  theoretical  and  empirical 
basis  for  these  metrics. 


Card88 

Card,  D.  N.  and  W.  W.  Agresti.  “Measuring  Soft¬ 
ware  Design  Complexity.”/.  Syst.  and  Software  8,  3 
(June  1988),  185-197. 

Abstract:  Architectural  design  complexity  derives 
from  two  sources:  structural  (or  intermodule)  com¬ 
plexity  and  local  (or  intramodule)  complexity. 
These  complexity  attributes  can  be  defined  in  terms 
of  functions  of  the  number  of  HO  variables  and 
fanout  of  the  modules  comprising  the  design.  A 
complexity  indicator  based  on  these  measures 
showed  good  agreement  with  a  subjective  assess¬ 
ment  of  design  quality  but  even  better  agreement 
with  an  objective  measure  of  software  error  rate. 
Although  based  on  a  study  of  only  eight  medium- 
scale  scientific  projects,  the  data  strongly  support 
the  value  of  the  proposed  complexity  measure  in 
this  context.  Furthermore,  graphic  representations 
of  the  software  designs  demonstrate  structural  dif¬ 
ferences  corresponding  to  the  results  of  the  numer¬ 
ical  complexity  analysis.  The  proposed  complexity 
indicator  seems  likely  to  be  a  useful  tool  for  evalu¬ 
ating  design  quality  before  committing  the  design  to 
code. 

The  measure  proposed  by  the  authors  expresses  the 
total  complexity  of  a  software  design  as  the  sum  of 
the  structural  (intermodule)  complexity  and  the  lo¬ 
cal  (intramodule)  complexity.  The  number  of  mod¬ 
ules,  number  of  VO  variables,  and  degree  of  fanout 
are  important  factors  in  determining  the  complexity. 

An  important  consideration  for  this  metric  is  that  il 
the  required  information  is  available  at  design  time 
and  before  code  is  produced.  The  approach  is 
similar  to  that  described  in  [Harrison87]. 

Cerino86 

Cerino,  D.  A.  “Software  Quality  Measurement  Tools 
And  Techniques.”  Proc.  COMPSAC  86.  Washing¬ 
ton,  D.  C.:  IEEE  Computer  Society,  Oct.  1986, 
160-167. 

Abstract:  This  paper  describes  research  being  per¬ 
formed  by  RADC,  to  develop  quality  measurement 
computer  based  tools  to  support  quality  evaluation 
during  each  activity  of  the  software  life  cycle.  Cur¬ 
rent  work  has  provided  a  baseline  quality  measure¬ 
ment  tool  to  monitor  the  overall  quality  and 
resource  expenditures  of  developing  software  by 
collecting  (semi-automated),  storing,  and  analyzing 
software  measurement  data  for  software  acquisition 
and  software  project  personnel.  This  tool  is  being 
used  in  the  prediction  and  assessment  of  developing 
software.  In  the  future,  this  tool  will  evolve  into  a 
metrics  researcher's  workbench  tuned  for  software 
development  personnel  and  will  be  completely  auto¬ 
mated.  Efforts  are  also  underway  to  specify  the 
data  collection  mechanisms  which  can  be  embedded 
within  software  engineering  environment  tools.  All 
three  approaches  are  presented. 
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This  paper  reports  on  work  being  done  at  Rome  Air 
Development  Center  to  develop  automated  tools  to 
support  software  quality  evaluation  during  each  ac¬ 
tivity  of  the  development  life  cycle. 

ChristensenSI 

Christensen,  K.,  G.  P.  Fitsos,  and  C.  P.  Smith.  “A 
Perspective  on  Software  Science.”  IBM  Systems 
7.20, 4  (1981),  372-387. 

Abstract:  Provides  an  overview  of  a  new  approach 
to  the  measurement  of  software.  The  measurements 
are  based  on  the  count  of  operators  and  operands 
contained  in  a  program.  The  measurement  method¬ 
ologies  are  consistent  across  programming  lan¬ 
guage  barriers.  Practical  significance  is  discussed, 
and  areas  are  identified  for  additional  research  and 
validation. 

The  authors  review  Halstead’s  software  science. 
They  conclude  that  software  science  “offers  a  meth¬ 
odology  not  only  for  making  measurements,  but 
also  for  calibrating  the  measuring  instruments.” 

Conte86 

Conte,  S.  D.,  H.  E.  Dunsmore,  and  V.  Y.  Shen. 
Software  Engineering  Metrics  and  Models.  Menlo 
Park,  Calif.:  Benjamin/Cummings,  1986. 

Table  of  Contents 

1  The  Role  of  Metrics  and  Models  in  Software  Devel¬ 

opment 

2  Software  Metrics 

3  Measurement  and  Analysis 

4  Small  Scale  Experiments,  Micro-Models  of  Effort, 

and  Programming  Techniques 

5  Macro-Models  of  Productivity 

6  Macro-Models  for  Effort  Estimation 

7  Defect  Models 

8  The  Future  of  Software  Engineering  Metrics  and 

Models 
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Appendix  A.  Statistical  Tables 
Appendix  B.  Data  Used  in  The  Text 

The  basic  outline  of  this  book  is  similar  to  that  of 
this  module.  It  is  intended  to  be  used  as  a  textbook, 
and  covers  most  of  the  topics  shown  in  the  module 
outline. 

Cote88 

Cote,  V.,  P.  Bourque,  S.  Oligny,  and  N.  Rivard. 
“Software  Metrics:  An  Overview  of  Recent 
Results.”  7.  Syst.  and  Software  8,  2  (March  1988), 
121-131. 

Abstract:  The  groundwork  for  software  metrics 
was  established  in  the  seventies,  and  from  these 
earlier  works,  interesting  results  have  emerged  in 


the  eighties.  Over  120  of  the  many  publications  on 
software  metrics  that  have  appeared  since  1980  are 
classified  and  presented  in  five  tables  that  comprise, 
respectively,  (1)  the  use  of  classic  metrics,  (2)  a 
description  of  new  metrics,  (3)  software  metrics 
through  the  life  cycle,  (4)  code  metrics  and  popular 
programming  languages,  and  (5)  various  metric- 
based  estimation  models. 

This  is  an  excellent  overview  of  the  software 
metrics  literature,  especially  for  the  period  1981 
through  1986.  It  cites  and  classifies  over  120  publi¬ 
cations.  Six  classic  papers  prior  to  1981  are  also 
included,  beginning  with  McCabe’s  1976  paper. 
Especially  with  the  five  tables  described  above,  this 
paper  should  prove  invaluable  to  anyone  interested 
in  consulting  the  literature  for  this  period. 

Coulter83 

Coulter,  N.  S.  “Software  Science  and  Cognitive 
Psychology.”  IEEE  Trans.  Software  Eng.  SE-9 ,  2 
(March  1983),  166-171. 

Abstract:  Halstead  proposed  a  methodology  for 
studying  the  process  of  programming  known  as  soft¬ 
ware  science.  This  methodology  merges  theories 
from  cognitive  psychology  with  theories  from  com¬ 
puter  science.  There  is  evidence  that  some  of  the 
assumptions  of  software  science  incorrectly  apply 
the  results  of  cognitive  psychology  studies.  Halstead 
proposed  theories  relative  to  human  memory 
models  that  appear  to  be  without  support  from 
psychologists.  Other  software  scientists,  however, 
report  empirical  evidence  that  may  support  some  of 
those  theories.  This  anomaly  places  aspects  of  soft¬ 
ware  science  in  a  precarious  position.  The  three 
conflicting  issues  discussed  in  this  paper  are  1) 
limitations  of  short-term  memory  and  a  number  of 
subroutine  parameters,  2)  searches  in  human  mem¬ 
ory  and  programming  effort,  and  3)  psychological 
time  and  programming  time. 

This  paper  is  a  review  of  Halstead’s  theory,  and 
critical  discussion  of  Halstead’s  use  of  relevant  the¬ 
ories  from  the  field  of  psychology. 

Curtls79a 

Curtis,  B.,  S.  B.  Sheppard,  P.  Milliman,  M.  A.  Borst, 
and  T.  Love.  “Measuring  the  Psychological  Com¬ 
plexity  of  Software  Maintenance  Tasks  with  the 
Halstead  and  McCabe  Metrics.”  IEEE  Trans.  Soft¬ 
ware  Eng.  SE-5, 2  (March  1979),  96-104. 

Abstract:  Three  software  complexity  measures 
(Halstead’s  E,  McCabe's  v(G),  and  the  length  as 
measured  by  a  number  of  statements )  were  com¬ 
pared  to  a  programmer  performance  on  two  soft¬ 
ware  maintenance  tasks.  In  an  experiment  on  un¬ 
derstanding,  length  and  v(G)  correlated  with  the 
percent  of  statements  correctly  recalled.  In  an  ex- 


SEI-CM-12-1 .1 


23 


Software  Metrics 


periment  on  modification ,  most  significant  correla¬ 
tions  were  obtained  with  metrics  computed  on  mod¬ 
ified  rather  than  unmodified  code.  All  three  metrics 
correlated  with  both  the  accuracy  of  the  modifica¬ 
tion  and  the  time  to  completion.  Relationships  in 
both  experiments  occurred  primarily  in  unstruc¬ 
tured  rather  than  structured  code,  and  in  code  with 
no  comments.  The  metrics  were  also  most  predictive 
of  performance  for  less  experienced  programmers. 
Thus,  these  metrics  appear  to  assess  psychological 
complexity  primarily  where  programming  practices 
do  not  provide  assistance  in  understanding  the 
code. 

This  paper  investigates  the  extent  to  which  the 
Halstead  (E)  and  McCabe  (v(G))  metrics  assess  the 
psychological  complexity  of  understanding  and 
modifying  software.  The  authors  claim  that 
“Halstead’s  metric  ...  was  proposed  as  an  absolute 
measure  of  psychological  complexity  (i.e.,  number 
of  mental  discriminations).”  Furthermore,  Mc¬ 
Cabe’s  measure,  although  not  formulated  in 
psychological  terms,  “may  prove  to  be  a  correlated 
measure  of  psychological  complexity.”  Two  experi¬ 
ments  were  performed,  using  professional  program¬ 
mers:  1)  understanding  an  existing  program  and  2) 
accurately  implementing  modifications  to  it.  Each 
experiment  involved  36  programmers  with  an 
average  of  more  than  5  years  of  professional  experi¬ 
ence.  Results  in  the  first  experiment  indicated  that 
the  Halstead  and  McCabe  metrics  correlated  well 
with  each  other  (0.84),  but  not  with  LOC  (0.47, 
0.64).  Correlations  with  measured  performances 
were  not  as  high,  ranging  from  -0.10  for  E,  to  -0.61 
for  LOC.  After  adjustments  in  the  data,  correlations 
were  -0.73  (£),  -0.21  (v(G)),  and  -0.65  (LOC).  In 
the  second  experiment,  results  indicated  that  all 
three  metrics  correlated  well  with  each  other  (0.85 
to  0.97).  By  comparison,  correlations  with  perfor¬ 
mance  were  not  high  (<  0.57),  but  the  authors  claim 
that  “their  magnitudes  are  typical  of  significant 
results  reported  in  human  factors  experiments.” 

Curtis79b 

Curtis,  B„  S.  B.  Sheppard,  and  P.  Milliman.  “Third 

Time  Charm:  Stronger  Prediction  of  Programmer 
Performance  by  Software  Complexity  Metrics.” 

Proc.  4th  Int.  Conf.  on  Software  Engineering.  New 
York:  IEEE,  Sept.  1979,  356-360. 

Abstract:  This  experiment  is  the  third  in  a  series 
investigating  characteristics  of  software  which  are 
related  to  its  psychological  complexity.  A  major 
focus  of  this  research  has  been  to  validate  the  use  of 
software  complexity  metrics  for  predicting  pro¬ 
grammer  performance.  In  this  experiment  we  im¬ 
proved  experimental  procedures  which  produced 
only  modest  results  in  the  previous  two  studies.  The 
experimental  task,  required  54  experienced  Fortran 
programmers  to  locate  a  single  bug  in  each  of  three 


programs.  Performance  was  measured  by  the  time 
to  locate  and  successfully  correct  the  bug.  Much 
stronger  results  were  obtained  than  in  earlier 
studies.  Halstead’s  E  proved  to  be  the  best  predic¬ 
tor  of  performance,  followed  by  McCabe' s  v(G)  and 
the  number  of  lines  of  code. 

This  paper  is  a  report  on  the  third  in  a  series  of 
experiments  on  software  complexity  metrics, 
specifically  McCabe’s  v(G),  Halstead’s  E,  and 
LOC.  Intercorrelations  of  metrics,  when  applied  at 
the  subroutine  level,  were:  0.92  for  E:v,  0.89  for 
LOCtE  and  0.81  for  LOC:v.  Intercorrelations  of 
metrics,  when  applied  at  the  program  level,  were  : 
0.76  for  E:v,  0.56  for  LOC:£  and  0.90  for  LOC:v. 
Correlations  of  these  metrics  with  measured  perfor¬ 
mances  ranged  from  0.52  to  0.75.  These  results  are 
considerably  better  than  those  attained  in  previous 
experiments,  e.g.,  as  reported  in  [Curtis79a], 

DeMarco82 

DeMarco,  T.  Controlling  Software  Projects:  Man¬ 
agement,  Measurement  &  Estimation.  New  York: 
Yourdon  Press,  1982. 
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Appendix  D.  Notational  Conventions  for  Specifica¬ 
tion  and  Design  Models 
Appendix  C.  A  Tailored  Primer  on  Statistics 
Appendix  D.  Sample  Program  to  Compute  Code  Vol¬ 
ume 

This  is  primarily  a  book  on  software  project  man¬ 
agement.  However,  it  recognizes  the  importance  of 
models  and  metrics  in  this  process,  and  much  of  the 
book  deals  with  these  topics.  Of  particular  interest 
is  the  development  of  specification  metrics  that  are 
available  early  in  the  development  cycle. 

Elshoff76 

Elshoff,  J.  L.  “Measuring  Commercial  PL/I  Pro¬ 
grams  Using  Halstead’s  Criteria.”  ACM  SIGPLAN 
Notices  11,5  (May  1976),  38-46. 

Abstract:  In  1972  Halstead  first  reported  his  inves¬ 
tigation  into  natural  laws  of  algorithm  analogous  to 
laws  of  natural  or  physical  sciences.  The  basic  idea 
is  to  separate  the  physical  structure  of  algorithms 
from  the  logical  structure  of  algorithms.  His  theory 
has  been  refined  since  that  time.  Furthermore,  the 
theory  has  been  applied  to  algorithms  in  different 
languages  and  different  environments.  In  this 
study,  Halstead's  criteria  are  applied  to  154  PL/I 
programs.  This  sample  contains  the  largest  algo¬ 
rithms  to  be  measured  by  his  methods  to  date.  A 
subset  of  120  of  the  programs  has  been  measured 
previously  by  other  techniques  which  describe  the 
basic  attributes  of  the  programs  herein  discussed. 

The  correlation  between  observed  program  length, 

N,  and  calculated  program  length  is  investigated. 

Of  the  154  programs,  34  have  been  developed  using 
structured  programming  techniques,  while  the  other 
120  were  not  Correlations  between  observed  and 
calculated  values  of  N  are  reported  to  be  0.985  (for 
the  structured  programs)  and  0.976,  respectively. 

Elshoff78 

Elshoff,  J.  L.  “An  Investigation  into  the  Effects  of 
the  Counting  Method  Used  on  Software  Science 
Measurements.”  ACM  SIGPLAN  Notices  13,  2  (Feb. 
1978),  30-45. 

Abstract:  Professor  Maurice  Halstead  of  Purdue 
University  first  defined  a  set  of  properties  of  algo¬ 
rithms  in  1972.  The  properties  are  defined  in  terms 
of  the  number  of  unique  operators,  unique 
operands,  total  operators,  and  total  operands  used 
to  express  the  algorithm.  Since  1972,  independent 
experiments  have  measured  various  sets  of  algo¬ 
rithms  and  have  supported  Halstead’s  theories  con¬ 
cerning  these  properties.  Also,  new  properties  have 
been  defined  and  experiments  performed  to  study 
them. 

This  paper  reports  a  study  in  which  different  meth¬ 
ods  of  counting  operators  and  operands  are  applied 


to  a  fixed  set  of  34  algorithms  written  in  PL/I. 
Some  properties  of  the  algorithms  vary  significantly 
depending  on  the  counting  method  chosen:  other 
properties  remain  stable.  Although  no  one  counting 
method  can  be  shown  to  be  best,  the  results  do  in¬ 
dicate  the  importance  of  the  counting  method  to  the 
overall  measurement  of  an  algorithm.  Moreover, 
the  results  provide  a  reminder  of  how  sensitive 
some  of  the  measurements  are  and  of  how  careful 
researchers  must  be  when  drawing  conclusions 
from  software  science  measurements. 

The  author  investigates  the  effect  of  variations  in 
counting  methods.  Eight  different  counting  meth¬ 
ods  were  applied  to  34  different  PL/I  programs. 
Results:  Length  (N)  and  volume  (V)  are  relatively 
insensitive,  while  level  (L)  and  effort  (£)  are  much 
more  sensitive  to  the  counting  method. 

Ferrari86 

Ferrari,  D.  “Considerations  on  the  Insularity  of  Per¬ 
formance  Evaluation.”  IEEE  Trans.  Software  Eng. 
SE-12,  6  (June  1986),  678-683. 

Abstract:  It  is  argued  that  systems  performance 
evaluation,  in  the  first  20  years  of  its  existence,  has 
developed  in  substantial  isolation  from  such  disci¬ 
plines  as  computer  architecture,  system  organiza¬ 
tion,  operating  systems,  and  software  engineering. 
The  possible  causes  for  this  phenomenon,  which 
seems  to  be  unique  in  the  history  of  engineering,  are 
explored.  Its  positive  and  negative  effects  on  com¬ 
puter  science  and  technology,  as  well  as  on  perfor¬ 
mance  evaluation  itself,  are  discussed.  The  draw¬ 
backs  of  isolated  development  outweigh  its  advan¬ 
tages.  Thus,  instructional  and  research  initiatives  to 
foster  the  rapid  integration  of  the  performance  eval¬ 
uation  viewpoint  into  the  mainstream  of  computer 
science  and  engineering  are  proposed. 

This  article  discusses  the  degree  of  isolation  of  per¬ 
formance  evaluation  studies  from  other  computer 
science/software  engineering  activities.  Although 
performance  evaluation  is  now  considered  a  sepa¬ 
rate  field,  the  author  questions  whether  this  is  desir¬ 
able  and  suggests  that  performance  evaluation  con¬ 
siderations  should  be  introduced  into  computer  sci¬ 
ence  and  engineering  courses  in  general. 

Fltzsimmons78 

Fitzsimmons,  A.  and  T.  Love.  “A  Review  and  Eval¬ 
uation  of  Software  Science.”  ACM  Computing  Sur¬ 
veys  10,  1  (March  1978),  3-18. 

Abstract:  During  recent  years,  there  have  been 
many  attempts  to  define  and  measure  the  "complex¬ 
ity"  of  a  computer  program.  Maurice  Halstead  has 
developed  a  theory  that  gives  objective  measures  of 
software  complexity.  Various  studies  and  experi¬ 
ments  have  shown  that  the  theory's  predictions  of 
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the  number  of  bugs  in  programs  and  of  the  time 
required  to  implement  a  program  are  amazingly  ac¬ 
curate.  It  is  a  promising  theory  worthy  of  much 
more  probing  scientific  investigation. 

This  paper  reviews  the  theory,  called  "software 
science,"  and  the  evidence  supporting  it.  A  brief 
description  of  a  related  theory,  called  "software 
physics,”  is  included. 

This  article  is  one  of  the  earliest  published  critical 
reviews  of  Halstead’s  work  on  software  science. 

Grady87 

Grady,  R.  B.  and  D.  R.  Caswell.  Software  Metrics: 
Establishing  a  Company-Wide  Program.  Englewood 
Cliffs,  N.  J.:  Prentice-Hall,  1987. 
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This  book  is  a  classic  in  software  metrics,  the  orig¬ 
inal  book  by  Halstead  expounding  the  principles  of 
software  science.  Principal  attractions  of  the  theory 
as  presented  here  are  its  high  degree  of  agreement 
with  selected  empirical  data  and  its  distinction  of 
providing  a  unified  theory  of  software  metrics.  Un¬ 
fortunately,  a  number  of  later  works  have  pointed 
out  several  difficulties  in  the  formulation  of  the  the¬ 
ory  and  its  empirical  validation,  e.g.,  see  [Shen83] 
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DeKock.  “Applying  Software  Complexity  Metrics  to 
Program  Maintenance.”  Computer  15,  9  (Sept. 
1982),  65-79. 

Abstract:  The  authors  find  that  predicting  software 
complexity  can  save  millions  in  maintenance  costs, 
but  while  current  measures  can  be  used  to  some 
degree,  most  are  not  sufficiently  sensitive  or  com¬ 
prehensive.  They  examine  some  complexity  metrics 
in  use. 

This  is  primarily  a  survey  of  more  than  a  dozen 
complexity  measures  currently  in  use.  Despite  the 
article’s  title,  little  guidance  is  given  on  how  to  ap¬ 
ply  these  to  the  software  maintenance  area. 
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Harrison,  W.  and  C.  Cook.  “A  Micro/Macro  Meas¬ 
ure  of  Software  Complexity.”  J.  Syst.  and  Software 
7,  3  (Sept  1987),  213-219. 

Abstract:  A  software  complexity  metric  is  a  quanti¬ 
tative  measure  of  the  difficulty  of  comprehending 
and  working  with  a  specific  piece  of  software.  The 
majority  of  metrics  currently  in  use  focus  on  a 
program’s  "microcomplexity."  This  refers  to  how 
difficult  the  details  of  the  software  are  to  deal  with. 
This  paper  proposes  a  method  of  measuring  the 
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“macrocomplexity,”  i.e.,  how  difficult  the  overall 
structure  of  the  software  is  to  deal  with,  as  well  as 
the  microcomplexity.  We  evaluate  this  metric  using 
data  obtained  during  the  development  of  a 
compiler/environment  project,  involving  over 
30,000  lines  of  C  code.  The  new  metric’ s  perfor¬ 
mance  is  compared  to  the  performance  of  several 
other  popular  metrics,  with  mixed  results.  We  then 
discuss  how  these  metrics,  or  any  other  metrics, 
may  be  used  to  help  increase  the  project  manage¬ 
ment  efficiency. 

The  authors  propose  a  software  complexity  metric 
incorporating  both  the  micro  (intra-sub-program 
level)  and  macro  (inter-program  level)  complexity 
contributed  by  each  subprogram.  The  metric 
(MMC)  is  compared  with  other  metrics  such  as 
those  of  Hall  and  Preisser,  Henry  and  Kafura, 
McCabe,  Halstead,  lin’s  of  code,  and  number  of 
procedures.  The  new  metric  correlated  better  (0.82) 
with  the  basic  error  rates  than  the  other  five  metrics. 
However,  in  identifying  software  modules  with  ex¬ 
ceptional  error  rates,  it  did  little  better  than  the  other 
metrics,  and  slighdy  worse  than  DSLOC. 

HenrySI 

Henry,  S.,  D.  Kafura,  and  K.  Harris.  “On  the 

Relationships  Among  Three  Software  Metrics.” 

Performance  Eval.  Rev.  10,  1  (Spring  1981),  81-88. 

Abstract:  Automatable  metrics  of  software  quality 
appear  to  have  numerous  advantages  in  the  design, 
construction  and  maintenance  of  software  systems. 
While  numerous  such  metrics  have  been  defined, 
and  several  of  them  have  been  validated  on  actual 
systems,  significant  work  remains  to  be  done  to  es¬ 
tablish  the  relationships  among  these  metrics.  This 
paper  reports  the  results  of  correlation  studies 
made  among  three  complexity  metrics  which  were 
applied  to  the  same  software  system.  The  three 
complexity  metrics  used  were  Halstead's  effort, 
McCabe's  cyclomatic  complexity  and  Henry  and 
Kafura' s  information  flow  complexity.  The  common 
software  system  was  the  UNIX  operating  system. 
The  primary  result  of  this  study  is  that  Halstead' s 
and  McCabe' s  metrics  are  highly  correlated  while 
the  information  flow  metric  appears  to  be  an  inde¬ 
pendent  measure  of  complexity. 

The  results  of  this  study  show  a  high  correlation  of 
all  three  metrics  with  the  number  of  errors  in  the 
software:  0.89  for  Halstead’s  E,  0.95  for  informa¬ 
tion  flow,  and  0.96  for  McCabe’s  metric.  In  addi¬ 
tion,  Halstead’s  metric  and  McCabe’s  metrics  ap¬ 
pear  to  be  highly  related  to  one  another  (0.84). 
However,  the  information  flow  metric  correlates 
poorly  with  either  the  Halstead  (0.38)  or  the 
McCabe  (0.35)  metric.  This  may  indicate  that  while 
all  three  metrics  are  reasonable  predictors  of  error 
rates,  the  information  flow  metric  is  somewhat  or¬ 
thogonal  to  the  other  two  complexity  metrics. 


Henry84 

Henry,  S.  and  D.  Kafura.  “The  Evaluation  of  Soft¬ 
ware  Systems’  Structure  Using  Quantitative  Soft¬ 
ware  Metrics.”  Software — Practice  and  Experience 
14,  6  (June  1984),  561-573. 

Abstract:  The  design  and  analysis  of  the  structure 
of  software  systems  has  typically  been  based  on 
purely  qualitative  grounds.  In  this  paper  we  report 
on  our  positive  experience  with  a  set  of  quantitative 
measures  of  software  structure.  These  metrics, 
based  on  the  number  of  possible  paths  of  informa¬ 
tion  flow  through  a  given  component,  were  used  to 
evaluate  the  design  and  implementation  of  a  soft¬ 
ware  system  (the  UNIX  operating  system  kernel) 
which  exhibits  the  interconnectivity  of  components 
typical  of  large-scale  software  systems.  Several  ex¬ 
amples  are  presented  which  show  the  power  of  this 
technique  in  locating  a  variety  of  both  design  and 
implementation  defects.  Suggested  repairs,  which 
agree  with  the  commonly  accepted  principles  of 
structured  design  and  programming,  are  presented. 
The  effect  of  these  alterations  on  the  structure  of  the 
system  and  the  quantitative  measurements  of  that 
structure  lead  to  a  convincing  validation  of  the  util¬ 
ity  of  information  flow  metrics. 

This  is  an  important  paper,  in  the  sense  that  the 
information  flow  metric  developed  is  shown  to  be 
related  to  software  complexities  and  thus  to  poten¬ 
tial  problem  areas  of  the  UNIX  operating  system. 
This  information  is  then  used  to  guide  efforts  to 
redesign,  those  portions  of  the  system  that  appear  to 
be  overly  complex.  Of  special  note  here  is  the  fact 
that  these  information  flow  metrics  may  be  com¬ 
puted  and  utilized  in  the  software  design  process, 
prior  to  the  generation  of  any  program  code. 

Jones84 

Jones,  T.  C.  “Reusability  in  Programming:  A  Survey 
of  the  State  of  the  Art.”  IEEE  Trans.  Software  Eng. 
SE-10,  5  (Sept.  1984),  488-494. 

Abstract:  As  programming  passes  the  30  year  mark 
as  a  professional  occupation,  an  increasingly  large 
number  of  programs  are  in  application  areas  that 
have  been  automated  for  many  years.  This  fact  is 
changing  the  technology  base  of  commercial  pro¬ 
gramming,  and  is  opening  up  new  markets  for  stan¬ 
dard  functions,  reusable  common  systems,  modules, 
and  the  tools  and  support  needed  to  facilitate 
searching  out  and  incorporating  existing  code  seg¬ 
ments.  This  report  addresses  the  1984  state  of  the 
art  in  the  domains  of  reusable  design,  common  sys¬ 
tems,  reusable  programs,  and  reusable  modules  or 
subroutines.  If  current  trends  toward  reusability 
continue,  the  amount  of  reused  logic  and  reused 
code  in  commercial  programming  systems  may  ap¬ 
proach  50  percent  by  1990.  However,  major  efforts 
will  be  needed  in  the  areas  of  reusable  data,  reus- 
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able  architectures,  and  reusable  design  before  reus¬ 
able  code  becomes  a  sound  basic  technology. 

The  author  includes  interesting  statistics  on  pro¬ 
grammer  and  software  populations  in  a  survey  of 
the  current  status  of  this  possible  key  to  program¬ 
ming  productivity. 

Jones86 

Jones,  T.  C.  Programming  Productivity.  New  York: 
McGraw-Hill,  1986. 
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Appendix  A.  Description  of  the  SPQR  Model 

This  book  is  primarily  a  study  of  programming 
productivity,  especially  as  it  might  be  predicted  by 
the  Software  Productivity,  Quality,  and  Reliability 
(SPQR)  model  developed  by  the  author.  It  enumer¬ 
ates  a  total  of  20  major  and  25  less  significant  fac¬ 
tors  that  influence  productivity,  many  of  which  are 
input  to  the  SPQR  model.  This  book  does  not  pos¬ 
sess  sufficient  breadth  in  software  metrics  to  serve 
as  a  text  for  this  module,  but  does  contain  illuminat¬ 
ing  discussions  of  some  currently  used  metrics  and 
problems  associated  with  them. 

Kafura81 

Kafura,  D.  and  S.  Henry.  “Software  Quality  Metrics 
Based  on  Interconnectivity.”  J.  Syst.  and  Software  2, 
2  (June  1981),  121-131. 

Abstract:  States  a  set  of  criteria  that  has  guided  the 
development  of  a  metric  system  for  measuring  the 
quality  of  a  large-scale  software  product.  This 
metric  system  uses  the  flow  of  information  within 
the  system  as  an  index  of  system  interconnectivity. 
Based  on  this  observed  interconnectivity,  a  variety 
of  software  metrics  can  be  defined.  The  types  of 
software  quality  features  that  can  be  measured  by 
this  approach  are  summarized.  The  data-flow  anal¬ 
ysis  techniques  used  to  establish  the  paths  of  infor¬ 
mation  flow  are  explained  and  illustrated.  Finally, 
a  means  of  integrating  various  metrics  and  models 
into  a  comprehensive  software  development  envi¬ 
ronment  is  discussed.  This  possible  integration  is 
explained  in  terms  of  the  Gandalf  system  currently 
under  development  at  Carnegie  Mellon  University. 

The  authors  propose  a  quality  meuic  for  large-scale 
software  products,  using  the  program  information 
flow  as  a  measure  of  system  interconnectivity. 
Results  of  application  to  UNIX  systems  are  dis¬ 
cussed. 


Kafura85 

Kafura,  D.  and  J.  Canning.  “A  Validation  of  Soft¬ 
ware  Metrics  Using  Many  Metrics  and  Two 
Resources.”  Proc.  8th  Inti.  Conf.  on  Software 
Engineering.  Washington,  D.  C.:  IEEE  Computer 
Society  Press,  1985,  378-385. 

Abstract:  In  this  paper  are  presented  the  results  of 
a  study  in  which  several  production  software  sys¬ 
tems  are  analyzed  using  ten  software  metrics.  The 
ten  metrics  include  both  measures  of  code  details, 
measures  of  structure,  and  combinations  of  these 
two.  Historical  data  recording  the  number  of  er¬ 
rors  and  the  coding  time  of  each  component  are 
used  as  objective  measures  of  resource  expenditure 
of  each  component.  The  metrics  are  validated  by 
showing:  (1 )  the  metrics  singly  and  in  combination 
are  useful  indicators  of  those  components  which  re¬ 
quire  the  most  resources,  (2)  clear  patterns  between 
the  metrics  and  the  resources  expended  are  visible 
when  both  resources  are  accounted  for,  (3)  meas¬ 
ures  of  the  structure  are  as  valuable  in  examining 
software  systems  as  measures  of  code  details,  and 
(4)  the  choice  of  which,  or  how  many,  software 
metrics  to  employ  in  practice  is  suggested  by  meas¬ 
ures  of  “yield"  and  “coverage". 

The  code  metrics  used  were  LOC,  Halstead’s  E,  and 
McCabe’s  v(G).  Structure  metrics  used  were  Henry 
and  Kafura ’s  information  flow,  McClure’s  invoca¬ 
tion  complexity,  Woodfield’s  review  complexity, 
and  Yau  and  Collofello’s  stability  measure.  The 
three  hybrid  measures  were  combinations  of  LOC 
with  the  metrics  of  Henry  and  Kafura,  Woodfield, 
and  Yau  and  Collofello,  respectively.  The  authors 
conclude  that  “The  interplay  between  and  among 
the  resources  and  factors  is  too  subtle  and  fluid  to 
be  observed  accurately  by  a  single  medic,  or  a 
single  resource.” 

Kafura87 

Kafura,  D.  and  G.  R.  Reddy.  “The  Use  of  Software 
Complexity  Metrics  in  Software  Maintenance.” 
IEEE  Trans.  Software  Eng.  SE-13,  3  (March  1987), 
335-343. 

Abstract:  This  paper  reports  on  a  modest  study 
which  relates  seven  different  software  complexity 
metrics  to  the  experience  of  maintenance  activities 
performed  on  a  medium  size  software  system.  The 
seven  metrics  studied  are  the  same  as  those  used  in 
[Kafura85],  The  software  system  involved  is  a  single 
user  relational  database  system,  written  in  Fortran. 
Three  different  versions  of  the  software  system  that 
evolved  over  a  period  of  three  years  were  analyzed 
in  this  study.  A  major  revision  of  the  system,  while 
still  in  its  design  phase,  was  also  analyzed. 

The  results  of  the  study  indicate:  1)  that  the  growth 
in  system  complexity  as  determined  by  the  software 
metrics  agree  with  the  general  character  of  the 
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maintenance  tasks  performed  in  successive  ver¬ 
sions;  2)  the  metrics  were  able  to  identify  the  im¬ 
proper  integration  of  functional  enhancements 
made  to  the  system;  3)  the  complexity  values  of  the 
system  components  as  indicated  by  the  metrics  con¬ 
form  well  to  an  understanding  of  the  system  by 
people  familiar  with  the  system;  4)  an  analysis  of 
the  redesigned  version  of  the  system  showed  the 
usefulness  of  software  metrics  in  the  (re) design 
phase  by  revealing  a  poorly  structured  component 
of  the  system. 

This  paper  reports  on  a  study  of  seven  different 
complexity  metrics  as  related  to  experience  in  soft¬ 
ware  maintenance.  The  research  involved  three  dif¬ 
ferent  versions  of  a  medium-sized  system  that 
evolved  over  a  period  of  three  years.  Conclusions 
include  the  statement  that  the  metrics  were  able  to 
identify  the  improper  integration  of  functional  en¬ 
hancements  made  to  the  system. 

Kemerer87 

Kemerer,  C.  F.  “An  Empirical  Validation  of  Soft¬ 
ware  Cost  Estimation  Models.”  Comm.  ACM  30,  5 

(May  1987),  416-429. 

Abstract:  Practitioners  have  expressed  concern 
over  their  inability  to  accurately  estimate  costs  as¬ 
sociated  with  software  development.  This  concern 
has  become  even  more  pressing  as  costs  associated 
with  development  continue  to  increase.  As  a  result, 
considerable  research  attention  is  now  directed  at 
gaining  a  better  understanding  of  the  software- 
development  process  as  well  as  constructing  and 
evaluating  software  cost  estimating  tools.  This 
paper  evaluates  four  of  the  most  popular  algorith¬ 
mic  models  used  to  estimate  software  costs  (SLIM, 
COCOMO.  Function  Points,  and  ESTIMAC.*). 
Data  on  75  large  completed  business  data- 
processing  projects  were  collected  and  used  to  test 
the  accuracy  of  the  models’  ex  post  effort  estima¬ 
tion.  One  important  result  was  that  Albrecht's 
Function  Points  effort  estimation  model  was  vali¬ 
dated  by  the  independent  data  provided  in  this  study 
[3],  The  models  not  developed  in  business  data- 
processing  environments  showed  significant  need 
for  calibration.  A?  models  of  the  software- 
development  process,  all  of  the  models  tested  failed 
to  sufficiently  reflect  the  underlying  factors  affect¬ 
ing  productivity.  Further  research  will  be  required 
to  develop  unde  standing  in  this  area. 

The  author  compares  results  of  four  cost  estimation 
models  on  a  set  of  15  large  (average  of  200 
KSLOC)  software  products,  all  developed  by  the 
ABC  consulting  firm  (anonymous).  The  models 
compared  were  Boehm’s  COCOMO,  Putnam’s 
SLIM,  Albrecht’s  Function  Points  (FP),  and 
Rubin’s  ESTIMACS.  Although  the  models  were 
developed  and  calibrated  with  very  different  data, 
the  author  seems  surprised  that  the  resulting  errors 


in  predicted  person-months  are  large  (COCOMO 
600%,  SLIM  771%,  FP  102%,  ESTIMACS  85%). 
The  author  concludes  that  models  developed  in  dif¬ 
ferent  environments  do  not  work  well  without 
recalibration  for  the  environment  where  they  are  to 
be  applied. 

Knafl86 

Knafl,  G.  J.  and  J.  Sacks.  “Software  Development 

Effort  Prediction  Based  on  Function  Points.”  Proc. 

COMPSAC  86.  Washington,  D.  C.:  IEEE  Computer 

Society  Press,  Oct.  1986,  319-325. 

Abstract:  We  analyze  a  published  data  set  used  to 
predict  future  software  development  effort  in  terms 
of  function  points.  For  a  full  range  of  COBOL  proj¬ 
ect  sizes,  a  straight  line  model  is  inappropriate  as  is 
a  linear  regression  model  using  the  software  sci¬ 
ence  transform  of  function  points.  Confidence 
bands  based  on  alternate  robust  models  show  the 
untenability  of  the  straight  line  model.  Acceptable 
uncertainty  levels  require  large  prediction  bands  in¬ 
dicating  that  function  points  by  itself  is  insufficient 
for  precise  prediction. 

The  authors  analyze  the  data  set  used  by  Albrecht 
and  Gaffney  [Albrecht83],  Their  conclusion  is  that 
the  function  point  measure  by  itself  is  insufficient 
for  precise  prediction. 

Lassez81 

Lassez,  J.-L.,  D.  Van  der  Knijff,  J.  Shepherd,  and 

C.  Lassez.  “A  Critical  Examination  of  Software 

Science.”  J.  Syst.  and  Software  2,  2  (June  1981), 

105-112. 

Abstract:  The  claims  that  software  science  could 
provide  an  empirical  basis  for  the  rationalization  of 
all  forms  of  algorithm  description  are  shown  to  be 
invalid  from  a  formal  point  of  view.  In  particular, 
the  conjectured  dichotomy  between  operators  and 
operands  is  shown  not  to  hold  over  a  wide  class  of 
languages.  An  experiment  that  investigated  dis¬ 
crepancies  between  the  level  measure  and  its  es¬ 
timator  is  described  to  show  that  its  failure  was  due 
to  shortcomings  in  the  theory.  One  cannot  obtain 
reliable  results  without  tampering  with  both  meas¬ 
ure  and  estimator  definitions. 

This  paper  is  a  critical  analysis  of  Halstead’s  theory. 
The  authors  conclude  that  his  fundamental  hy¬ 
potheses  are  not  applicable  over  the  broad  range 
claimed  by  Halstead. 

Levitln86 

Levitin,  A.  V.  “How  To  Measure  Software  Size,  and 

How  Not  To.”  Proc.  COMPSAC  86.  Washington, 

D.  C.:  IEEE  Computer  Society  Press,  Oct.  1986, 

314-318. 
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Abstract:  The  paper  suggests  a  list  of  criteria  de¬ 
sirable  for  a  measure  of  software  size.  The  prin¬ 
cipal  known  size  metrics  -  source  lines  of  code,  the 
number  of  statements.  Software  Science  length  and 
volume,  and  the  number  of  tokens  —  are  discussed 
from  the  standpoint  of  these  general  criteria.  The 
analysis  indicates  that  the  number  of  tokens  is  supe¬ 
rior  over  the  other,  much  more  often  used  metrics. 

Levitin  compares  common  size  metrics,  such  as 
LOC,  number  of  statements,  and  Halstead’s  meas¬ 
ures  n,  N,  and  V.  He  reports  that  n  (the  number  of 
tokens)  is  superior  to  the  other  metrics  as  a  measure 
of  size. 

LI87 

Li,  H.  F.  and  W.  K.  Cheung.  “An  Empirical  Study  of 

Software  Metrics.”  IEEE  Trans.  Software  Eng. 
SE-13 ,  6  (June  1987),  697-708. 

Abstract:  Software  metrics  are  computed  for  the 
purpose  of  evaluating  certain  characteristics  of  the 
software  developed.  A  Fortran  static  source  code 
analyzer,  FORTRAN AL,  was  developed  to  study  31 
metrics,  including  a  new  hybrid  metric  introduced 
in  this  paper,  and  applied  to  a  database  of  255  pro¬ 
grams,  all  of  which  were  student  assignments. 
Comparisons  among  these  metrics  are  performed. 
Their  cross-correlation  confirms  the  internal  con¬ 
sistency  of  some  of  these  metrics  which  belong  to 
the  same  class.  To  remedy  the  incompleteness  of 
most  of  these  metrics,  the  proposed  metric  incorpo¬ 
rates  context  sensitivity  to  structural  attributes  ex¬ 
tracted  from  a  flow  graph.  It  is  also  concluded  that 
many  volume  metrics  have  similar  performance 
while  some  control  metrics  surprisingly  correlate 
well  with  typical  volume  metrics  in  the  test  samples 
used.  A  flexible  class  of  hybrid  metric  can  incor¬ 
porate  both  volume  and  control  attributes  in  assess¬ 
ing  software  complexity. 

The  authors  report  on  a  study  of  31  different  com¬ 
plexity  metrics  applied  to  a  database  of  255 
FORTRAN  programs  (all  student  assignments). 
They  claim  that  all  the  other  metrics  arc  incomplete 
and  propose  a  new  hybrid  metric  to  fill  the  gap. 
The  article  includes  an  interesting  classification  of 
complexity  metrics,  shown  in  the  form  of  a  chan. 

Lister82 

Lister,  A.  M.  “Software  Science — The  Emperor’s 

New  Clothes?”  Australian  Computer  J.  14,  2  (May 
1982),  66-70. 

Abstract:  The  emergent  field  of  software  science 
has  recently  received  so  much  publicity  that  it 
seems  appropriate  to  pose  the  question  above.  This 
paper  attempts  to  provide  an  answer  by  examining 
the  methodology  of  software  science,  and  by  point¬ 
ing  out  apparent  anomalies  in  three  major  areas: 


the  length  equation,  the  notion  of  potential  volume, 
and  the  notion  of  language  level.  The  paper  con¬ 
cludes  that  the  emperor  is  in  urgent  need  of  a  good 
tailor. 

A  critical  review  of  Halstead’s  results  and  the  1978 
review  of  the  same  by  Fitzsimmons  and  Love. 
Halstead’s  results  for  N,  V*,  L,  and  £  are  all 
criticized. 

McCabe76 

McCabe,  T.  J.  “A  Complexity  Measure.”  IEEE 
Trans.  Software  Eng.  SE-2, 4  (Dec.  1976),  308-320. 

Abstract:  This  paper  describes  a  graph-theoretic 
complexity  measure  and  illustrates  how  it  can  be 
used  to  manage  and  control  program  complexity. 
The  paper  first  explains  how  the  graph-theory  con¬ 
cepts  apply  and  gives  an  intuitive  explanation  of  the 
graph  concepts  in  programming  terms.  The  control 
graphs  of  several  actual  Fortran  programs  are  then 
presented  to  illustrate  the  correlation  between  intui¬ 
tive  complexity  and  the  graph-theoretic  complexity. 
Several  properties  of  the  graph-theoretic  complexity 
are  then  proved  which  show,  for  example,  that  com¬ 
plexity  is  independent  of  physical  size  (adding  or 
subtracting  functional  statements  leaves  complexity 
unchanged)  and  complexity  appends  only  on  the  de¬ 
cision  structure  of  a  program.  The  issue  of  using 
nonstructured  control  flow  is  also  discussed.  A 
characterization  of  nonstructured  control  graphs  is 
given  and  a  method  of  measuring  the  "structured¬ 
ness"  of  a  program  is  developed.  The  relationship 
between  structure  and  reducibility  is  illustrated 
with  several  examples. 

The  last  section  of  this  paper  deals  with  a  testing 
methodology  used  in  conjunction  with  the  com¬ 
plexity  measure;  a  testing  strategy  is  defined  that 
dictates  that  a  program  can  either  admit  of  a  cer¬ 
tain  minimal  testing  level  or  the  program  can  be 
structurally  reduced. 

McCabe’s  classic  paper  on  the  cyclomatic  com¬ 
plexity  of  a  computer  program.  This  is  an  excellent 
paper.  The  contents  are  well-described  by  the  ab¬ 
stract. 

McCall77 

McCall,  J.  A.,  P.  K.  Richards,  and  G.  F.  Walters. 
Factors  in  Software  Quality,  Vol.  I,  II,  III:  Final 
Tech.  Report.  RADC-TR-77-369,  Rome  Air  Devel¬ 
opment  Center,  Air  Force  Systems  Command,  Grif- 
fiss  Air  Force  Base,  N.  Y„  1977. 

This  is  one  of  the  earliest,  often-referenced  works 
on  software  quality  factors.  1  ne  quality  character¬ 
istics  identified  in  this  report  are  also  discussed  in 
[Perlis81],  pages  204-206,  where  they  are  compared 
to  those  of  Boehm  et  al. 
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Mohanty81 

Mohanty,  S.  N.  “Software  Cost  Estimation:  Present 
and  Future.”  Software — Practice  and  Experience  11, 
2  (Feb.  1981),  103-121. 

Abstract:  The  state-of-the-art  in  software  cost  es¬ 
timation  is  reviewed.  The  estimated  cost  of  a  soft¬ 
ware  system  varies  widely  with  the  model  used. 
Some  variation  in  cost  estimation  is  attributable  to 
the  anomalies  in  the  cost  data  base  used  in  devel¬ 
oping  the  model.  The  other  variations,  it  is  claimed 
are  due  to  the  presence  or  absence  of  certain 
'qualities'  in  the  final  product.  These  qualities  are 
measures  of  ’goodness’  in  design,  development  and 
test-integration  phases  of  software.  To  consider 
quality  as  a  driver  of  software  cost,  the  author  sug¬ 
gests  an  association  between  cost  and  quality  and 
proposes  a  way  to  use  quality  metrics  to  estimate 
software  cost. 

Mohanty  reviews  the  state-of-the-art  in  software 
cost  estimation.  More  than  15  models  are  dis¬ 
cussed,  including  those  of  Wolverton,  Price-S,  and 
Walston/Felix.  The  author  lists  49  factors  that  in¬ 
fluence  software  development  costs. 

Musa75 

Musa,  J.  D.  “A  Theory  of  Software  Reliability  and 
Its  Application.”  IEEE  Trans.  Software  Eng.  1,  3 
(Sept.  1975),  312-327.  Reprinted  in  [Basili80], 
194-212. 

Abstract:  An  approach  to  a  theory  of  software 
reliability  based  on  execution  time  is  derived.  This 
approach  provides  a  model  that  is  simple,  intui¬ 
tively  appealing,  and  immediately  useful.  The  the¬ 
ory  permits  the  estimation,  in  advance  of  a  project, 
of  the  amount  of  testing  in  terms  of  execution  time 
required  to  achieve  a  specified  reliability  goal 
[ staled  as  a  mean  time  to  failure  (MTTF)}.  Execu¬ 
tion  time  can  then  be  related  to  calendar  time,  per¬ 
mitting  a  schedule  to  be  developed.  Estimates  of 
execution  time  and  calendar  time  remaining  until 
the  reliability  goal  is  attained  can  be  continually 
remade  as  testing  proceeds,  based  only  on  the 
length  of  execution  time  intervals  between  failures. 
The  current  MTTF  and  the  number  of  errors 
remaining  can  also  be  estimated.  Maximum  likeli¬ 
hood  estimation  is  employed,  and  confidence  inter¬ 
vals  are  also  established.  The  foregoing  informa¬ 
tion  is  obviously  very  valuable  in  scheduling  and 
monitoring  the  progress  of  program  testing.  A  pro¬ 
gram  has  been  implemented  to  compute  the  forego¬ 
ing  quantities.  The  reliability  model  that  has  been 
developed  can  be  used  in  making  system  tradeoffs 
involving  software  or  software  and  hardware  com¬ 
ponents.  It  also  provides  a  soundly  based  unit  of 
measure  for  the  comparative  evaluation  of  various 
programming  techniques  that  are  expected  to  en¬ 
hance  reliability.  The  model  has  been  applied  to 


four  medium-sized  development  projects,  all  of 
which  have  completed  their  life  cycles.  Measure¬ 
ments  taken  of  MTTF  during  operation  agree  well 
with  the  predictions  made  at  the  end  of  system  test. 

As  far  as  the  author  can  determine,  these  are  the 
first  times  that  a  software  reliability  model  was 
used  during  software  development  projects.  The 
paper  reflects  and  incorporates  the  practical  expe¬ 
rience  gained. 

The  author  develops  the  basic  concept  of  software 
reliability  and  discusses  its  application  to  actual 
projects.  This  is  one  of  the  early  papers  by  this 
author  on  this  subject.  Later  work  is  reported  in 
[Musa80]  and  [Musa87], 

Musa80 

Musa,  J.  D.  “Software  Reliability  Measurement.”  J. 
Syst.  and  Software  1,  3  (1980),  223-241.  Reprinted 
in  [Basiii80],  194-212. 

Abstract:  The  quantification  of  software  reliability 
is  needed  for  the  system  engineering  of  products 
involving  computer  programs  and  the  scheduling 
and  monitoring  of  software  development.  It  is  also 
valuable  for  the  comparative  evaluation  of  the  ef¬ 
fectiveness  of  various  design,  coding,  testing,  and 
documentation  techniques.  This  paper  outlines  a 
theory  of  software  reliability  based  on  execution  or 
CPU  time,  and  a  concomitant  model  of  the  testing 
and  debugging  process  that  permits  execution  time 
to  be  related  to  calendar  time.  The  estimation  of 
parameters  of  the  model  is  discussed.  Application 
of  the  theory  in  scheduling  and  monitoring  software 
projects  is  described,  and  data  taken  from  several 
actual  projects  are  presented. 

This  paper  further  develops  the  basic  concepts  of 
software  reliability  and  its  measurement.  The  au¬ 
thor  has  developed  these  concepts  much  more  fully 
in  [Musa87], 

Musa87 

Musa,  J.  D„  A.  Iannino,  and  K.  Okumoto.  Software 
Reliability:  Measurement,  Prediction,  Application. 
New  York:  McGraw-Hill.  1987. 
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gram  of  lesser  complexity  than  for  a  more  complex 
program.  This  paper  discusses  these  anomalies,  de¬ 
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eliminate  them,  and  applies  the  measure  to  several 
programs  in  the  literature. 
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This  book  provides  an  extensive  review  of  the  status 
of  software  metrics  as  of  1981  or  slightly  before. 
Specifically,  it  contains  a  number  of  state-of-the-art 
evaluations,  as  well  as  recommendations  for  re¬ 
search  initiatives  in  related  areas  of  software 
metrics.  For  reference  purposes,  it  also  contains  an 
extensive  annotated  bibliography  of  more  than  350 
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Engineering.  New  York:  IEEE,  Sept.  1982, 94-103. 

Abstract:  Experiments  with  quantitative  assessment 
and  prediction  of  software  reliability  are  presented. 
The  experiments  are  based  on  the  analysis  of  the 
error  and  the  complexity  characteristics  of  a  large 
set  of  programs.  The  first  part  of  the  study  con¬ 
cerns  the  data  collection  process  and  the  analysis  of 
the  error  data  and  complexity  measures.  The 
relationships  between  the  complexity  profile  and  the 
error  data  of  the  procedures  of  the  programs  are 
then  investigated  with  the  help  of  discriminant 
statistical  analysis  technique.  The  results  of  these 
analyses  show  that  an  estimation  can  be  derived 
from  the  analysis  of  its  complexity  prcfle. 

The  software  used  in  this  study  consisted  of  a 
family  of  compilers,  all  written  in  the  LTR  lan¬ 
guage.  The  compiler  consisted  of  a  kernel,  imple¬ 
mented  by  seven  compilation  units,  and  a  code  gen¬ 
erator,  implemented  by  four  compilation  units. 
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These  programs  were  developed  by  a  number  of 
programmers  over  an  extended  period  of  time,  from 
1972  through  1977  and  beyond.  Both  textual 
(Halstead)  complexity  metrics  and  structural  (Mc¬ 
Cabe,  reachability,  paths,  etc.)  complexity  metrics 
were  investigated.  An  error  data  file  containing 
data  on  over  one  thousand  errors  was  created,  al¬ 
though  long  after  the  work  was  done,  in  some  cases. 
Observations:  All  complexity  measures,  except  the 
normalized  cyclomatic  and  cocyclomatic  numbers, 
discriminated  between  procedures  with  no  errors 
and  programs  with  errors.  Thus,  although  the 
cyclomatic  number  discriminates  in  the  same  man¬ 
ner,  the  authors  conclude  that  this  is  only  because  of 
its  high  correlation  with  the  size  and  volume 
metrics.  In  addition,  the  discriminating  effect  ap¬ 
peared  to  be  maximal  with  regard  to  errors  created 
during  the  design  specification  or  coding  stages.  In 
ranking  the  measures  as  to  discriminating  effects, 
the  vocabulary,  n,  appears  at  the  top  level  of  the 
decision  tree. 
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Putnam,  L.  H.  “A  General  Empirical  Solution  to  the 

Macro  Software  Sizing  and  Estimating  Problem.” 

IEEE  Trans.  Software  Eng.  SEM,  4  (July  1978), 

345-361. 

Abstract:  Application  software  development  has 
been  an  area  of  organizational  effort  that  has  not 
been  amenable  to  the  normal  managerial  and  cost 
controls.  Instances  of  actual  costs  of  several  times 
the  initial  budgeted  cost,  and  a  time  to  initial  opera¬ 
tional  capability  sometimes  twice  as  long  as 
planned  are  more  often  the  case  than  not.  A  mac¬ 
romethodology  to  support  management  needs  has 
now  been  developed  that  will  produce  accurate  es¬ 
timates  of  manpower,  costs,  and  times  to  reach  cri¬ 
tical  milestones  of  software  projects.  There  are 
four  parameters  in  the  basic  system  and  these  are  in 
terms  managers  are  comfortable  working  with — ef¬ 
fort,  development  time,  elapsed  time,  and  a  state-of- 
technology  parameter.  The  system  provides  manag¬ 
ers  sufficient  information  to  assess  the  financial  risk 
and  investment  value  of  a  new  software  develop¬ 
ment  project  before  it  is  undertaken  and  provides 
techniques  to  update  estimates  from  the  actual  data 
stream  once  the  project  is  underway.  Using  the 
technique  developed  in  the  paper,  adequate  analysis 
for  decisions  can  be  made  in  an  hour  or  two  using 
only  a  few  quick  reference  tables  and  a  scientific 
pocket  calculator. 

The  author  studied  data  on  large  systems  developed 
by  the  U.  S.  Army  Computer  Systems  Command, 
which  develops  application  software  in  the  logistic, 
personnel,  financial,  force  accounting,  and  facilities 
engineering  areas.  Systems  studied  ranged  in  size 
from  30  man-years  of  development  and  mainte¬ 
nance  effort  to  over  1,000  man-years.  The  Nordcn/ 


Rayleigh  model  was  used  to  derive  an  estimating 
equation  relating  the  size  (LOC)  of  the  project  to 
the  product  of  a  state  of  technology  factor,  the  cube 
root  of  the  applied  effort,  and  the  development  time 
to  the  4/3  power.  This  was  found  to  work  fairly 
well  in  the  environment  for  which  the  data  were 
available.  However,  the  author  states  that  the  es¬ 
timators  developed  here  probably  cannot  be  used  by 
other  software  houses,  “at  least  not  without  great 
care  and  considerable  danger,”  because  of  the  dif¬ 
ference  in  standards  and  procedures.  The  author 
later  developed  this  basic  model  into  the  proprietary 
product  SLIM  (Software  Life-cycle  Methodology) 
[Conte86]. 
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and  Life-Cycle  Control:  Getting  the  Software 
Numbers.  New  York:  IEEE,  1980. 
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This  is  an  excellent  tutorial  on  software  costing  and 
estimating  techniques,  as  they  had  developed  to  the 
late  1970’s.  It  contains  contributions  from  most  of 
the  major  figures  in  this  area  up  to  this  time,  as  can 
be  seen  from  the  table  of  contents,  including  Put¬ 
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Interpretation  Through  Experimentation.”  Proc. 

COMPSAC  86.  Washington,  D.  C.:  IEEE  Computer 

Society  Press,  Oct.  1986,  368-374. 

Abstract:  This  paper  poses  several  conjectures  de¬ 
rived  from  the  current  view  of  Software  Metrics  and 
then  analyzes  these  conjectures  through  the  use  of 
source  code  metrics  applied  to  two  medium  size 
software  systems.  The  analysis  attempts  to  deter¬ 
mine  the  robustness  of  several  metrics,  the  infor¬ 
mation  conveyed  through  them  and  how  that  infor¬ 
mation  could  be  used  for  software  management  pur¬ 
poses.  One  important  use  observed  is  the  dis¬ 
criminant  power  of  the  metrics  when  software  com¬ 
ponents  are  grouped  together  into  sets  of  common 
characteristics  to  statistically  distinguish  between 
the  groups. 

The  authors  attempt  to  determine  the  robustness  of 
metrics  such  as  McCabe’s  v(G),  LOC,  etc.,  for  a 
total  of  15  code  metrics  plus  Kafura  and  Henry’s 
information  flow  metrics.  Results  are  not  easily 
summarized. 

Rombach87 

Rombach,  H.  D.  “A  Controlled  Experiment  on  the 
Impact  of  Software  Structure  on  Maintainability.” 

IEEE  Trans.  Software  Eng.  SE-13,  3  (March  1987), 
344-354. 

Abstract:  This  paper  describes  a  study  on  the  im¬ 


pact  of  software  structure  on  maintainability  as¬ 
pects  such  as  comprehensibility,  locality,  modifia¬ 
bility,  and  reusability  in  a  distributed  system  envi¬ 
ronment.  The  study  was  part  of  a  project  at  the 
University  of  Kaiserslautern,  West  Germany,  to  de¬ 
sign  and  implement  LADY,  a  LAnguage  for 
Distributed  sY stems.  The  study  addressed  the  im¬ 
pact  of  software  structure  from  two  perspectives. 
The  language  designer' s  perspective  was  to  eval¬ 
uate  the  general  impact  of  the  set  of  structural  con¬ 
cepts  chosen  for  LADY  on  the  maintainability  of 
software  systems  implemented  in  LADY.  The  lan¬ 
guage  user's  perspective  was  to  derive  structural 
criteria  (metrics),  measurable  from  LADY  systems, 
that  allow  the  explanation  or  prediction  of  the  soft¬ 
ware  maintenance  behavior.  A  controlled  mainte¬ 
nance  experiment  was  conducted  involving  twelve 
medium-size  distributed  software  systems;  six  of 
these  systems  were  implemented  in  LADY,  the  other 
six  systems  in  an  extended  version  of  sequential 
Pascal.  The  benefits  of  the  structural  LADY  con¬ 
cepts  were  judged  based  on  a  comparison  of  the 
average  maintenance  behavior  of  the  LADY  systems 
and  the  Pascal  systems;  the  maintenance  metrics 
were  derived  by  analyzing  the  interdependence  be¬ 
tween  structure  and  maintenance  behavior  of  each 
individual  LADY  system. 

The  author  reports  results  of  a  controlled  experi¬ 
ment  investigating  the  effect  of  software  structure 
on  the  maintainability  of  software  in  a  distributed 
system.  The  experiments  were  run  on  12  systems  of 
1 .5  to  1 5  KLOC.  The  software  systems  were  devel¬ 
oped  in  C-TIP  (an  extended  Pascal)  and  LADY  (a 
LAnguage  for  Distributed  sYstems).  Results  in¬ 
dicated  that  complexity  measures  based  on  infor¬ 
mation  flows  are  useful  predictors  of  maintainabil¬ 
ity,  including  comprehensibility,  locality  and  modi¬ 
fiability.  Results  with  regard  to  reusability  were 
inconclusive. 
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Rubin,  H.  A.  “Macro-Estimation  of  Software  Devel¬ 
opment  Parameters:  The  ESTIMACS  System.” 

Proc.  SOFTFAIR:  A  Conference  on  Software  Devel¬ 
opment  Tools,  Techniques,  and  Alternatives.  New 

York:  IEEE,  July  1983,  109-118. 

Abstract:  System  developers  are  continually  faced 
with  the  problem  of  being  asked  to  provide  reliable 
estimates  early  in  the  software  development  proc¬ 
ess,  often  before  any  of  the  requirements  are  known. 
The  ESTIMACS  models  offer  a  solution  to  his  prob¬ 
lem  by  relating  gross  business  specifications  to  the 
estimate  dimensions  of  effort  hours,  staff,  cost, 
hardware,  risk  and  portfolio  effects.  In  addition, 
their  implementation  structure  takes  the  user 
through  a  programmed  learning  experience  in  un¬ 
derstanding  the  estimates  produced. 

The  author  describes  his  recently  developed  ES- 
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TIMACS  system  for  use  in  estimating,  planning, 
and  controlling  the  software  development  life  cycle. 
The  model  includes  estimators  for  development  ef¬ 
fort,  staffing  requirements,  costs,  hardware  require¬ 
ments,  risk  assessment,  and  total  resource  demands. 
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Rubin,  H.  A.  “A  Comparison  of  Software  Cost  Es¬ 
timation  Tools."  System  Development  7,  5  (May 
1987),  1-3. 

Abstract:  There  are  only  a  handful  of  software  cost 
estimation  tools  that  are  in  general  use  today.  For 
the  8th  International  Conference  on  Software  Engi¬ 
neering  held  in  August  1985,  the  authors,  or 
representatives,  of  the  most  "popular"  tools  were 
presented  with  a  common  problem  to  analyze  as 
basis  for  comparison.  In  this  context,  each  was 
asked  to  address  his  analysis  approach,  input 
parameters  used,  parameters  not  used,  and  results 
generated.  This  article  contains  the  statement  of 
the  problem,  a  summary  of  the  results  provided  by 
each  participant,  and  a  discussion  of  the  implica¬ 
tion  of  the  results  for  those  embarking  on  estimation 
programs  within  their  own  IS  organizations. 

The  author  reports  on  a  comparison  of  models  JS-2, 
SLIM,  GECOMO,  ESTIMACS  (by  author),  PCOC, 
and  SPQR/10,  all  applied  to  the  same  cost  estima¬ 
tion  problem.  Details  of  the  results  are  not  given, 
but  Rubin  states  that  the  results  “varied  in  a  range  of 
almost  8  to  1.” 

Ruston79 

Ruston,  H.  (Workshop  Chair).  Workshop  on  Quanti¬ 
tative  Software  Models  for  Reliability,  Complexity 
and  Cost:  An  Assessment  of  the  State  of  the  Art.  New 
York:  IEEE,  1979. 

This  proceedings  of  a  workshop  on  models  of  the 
software  process  involving  reliability,  complexity, 
and  cost  factors,  is  a  good  collection  of  work  done 
up  to  this  time  (late  1970s).  Although  now  some¬ 
what  out-of-date,  it  still  serves  as  a  good  reference 
for  work  done  in  this  area  prior  to  1980. 

Shen83 

Shen,  V.  Y.,  S.  D.  Conte,  and  H.  E.  Dunsmore. 
“Software  Science  Revisited:  A  Critical  Analysis  of 
the  Theory  and  Its  Empirical  Support.”  IEEE  Trans. 
Software  Eng.  SE-9,  2  (March  1983),  155-165. 

Abstract:  The  theory  of  software  science  was  devel¬ 
oped  by  the  late  M.  H.  Halstead  of  Purdue  Univer¬ 
sity  during  the  early  1970' s.  It  was  first  presented 
in  unified  form  in  the  monograph  Elements  of  Soft¬ 
ware  Science  published  by  Elsevier  North-Holland 
in  1977.  Since  it  claimed  to  apply  scientific  meth¬ 
ods  to  the  very  complex  and  important  problem  of 
software  production,  and  since  experimental 


evidence  supplied  by  Halstead  and  others  seemed  to 
support  the  theory,  it  drew  widespread  attention 
from  the  computer  science  community. 

Some  researchers  have  raised  serious  questions 
about  the  underlying  theory  of  software  science.  At 
the  same  time,  experimental  evidence  supporting 
some  of  the  metrics  continue  to  be  presented.  This 
paper  is  a  critique  of  the  theory  as  presented  by 
Halstead  and  a  review  of  experimental  results  con¬ 
cerning  software  science  metrics  published  since 
1977. 

This  paper  is  a  critical  review  of  Halstead’s  soft¬ 
ware  science  and  its  empirical  support.  Among 
other  things,  shortcomings  in  the  derivations  of  N, 

V*  and  T  are  noted. 

Shen85 

Shen,  V.  Y„  T.  J.  Yu,  S.  M.  Thebaut,  and  L.  R.  Paul¬ 
sen.  “Identifying  Error-Prone  Software — An  Empiri¬ 
cal  Study.”  IEEE  Trans.  Software  Eng.  SE-11,  4 

(April  1985),  317-324. 

Abstract:  A  major  portion  of  the  effort  expended  in 
developing  commercial  software  today  is  associated 
with  program  testing.  Schedule  andlor  resource 
constraints  frequently  require  that  testing  be  con¬ 
ducted  so  as  to  uncover  the  greatest  number  of  er¬ 
rors  possible  in  the  time  allowed.  In  this  paper  we 
describe  a  study  undertaken  to  assess  the  potential 
usefulness  of  various  product-  and  process-related 
measures  in  identifying  error-prone  software.  Our 
goal  was  to  establish  an  empirical  basis  for  the  effi¬ 
cient  utilization  of  limited  testing  resources  using 
objective,  measurable  criteria.  Through  a  detailed 
analysis  of  three  software  products  and  their  error 
discovery  histories,  we  have  found  simple  metrics 
related  to  the  amount  of  data  and  the  structural 
complexity  of  programs  to  be  of  value  for  this  pur¬ 
pose. 

This  study  involved  five  products  developed  and 
released  since  1980,  in  three  different  languages 
(assembler,  Pascal  and  PL/S).  The  authors  report 
that  the  best  predictors  of  defect  rates  at  the  end  of 
program  design  and  program  coding  phases  were 
Halstead’s  n,  and  n2,  and  the  total  number  of  deci¬ 
sions,  DE.  At  the  end  of  the  software  testing 
period,  the  best  defect  indicating  metrics  were 
found  to  be  Halstead’s  n2  and  the  actual  number  of 
program  trouble  memoranda  (PTMs).  The  authors 
also  state  that  “Our  study  of  error  density  shows 
that  this  measure  is,  in  general,  a  poor  size- 
normalized  index  of  program  quality.  Its  use  in 
comparing  the  quality  of  either  programs  or  pro¬ 
grammers  without  regard  to  related  factors  such  as 
complexity  and  size  is  ill-advised.” 
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Shepperd88 

Shepperd,  M.  “A  Critique  of  Cyclomatic  Com¬ 
plexity  as  a  Software  Metric.”  Software  Engineering 

J.3,  2  (March  1988),  30-36. 

Abstract:  McCabe's  cyclomatic  complexity  metric 
is  widely  cited  as  a  useful  predictor  of  various  soft¬ 
ware  attributes  such  as  reliability  and  development 
effort.  This  critique  demonstrates  that  it  is  based 
upon  poor  theoretical  foundations  and  an  inade¬ 
quate  model  of  software  development.  The  argu¬ 
ment  that  the  metric  provides  the  developer  with  a 
useful  engineering  approximation  is  not  borne  out 
by  the  empirical  evidence.  Furthermore,  it  would 
appear  that  for  a  large  class  of  software  it  is  no 
more  than  a  proxy  for,  and  in  many  cases  is  outper¬ 
formed  by,  lines  of  code. 

The  author’s  criticisms,  very  briefly  summarized, 
include  the  following.  Theoretical:  1)  simplistic  ap¬ 
proach  to  decision  counting,  2)  independence  of 
generally  accepted  program  structuring  techniques, 
and  3)  arbitrary  impact  of  program  modularization. 
Empirical:  Studies  to  date  do  not  establish  validity 
of  v(G)  as  a  reliable  measure  of  any  observed  soft¬ 
ware  properties.  Responses  can  be  summarized  as 
follows.  Theoretical:  v(G)  is  a  simple,  objective 
measure  of  one  aspect  of  a  software  product.  It  is 
probably  unrealistic  to  expect  it  to  correlate  well 
with  any  simply  observable,  gross  characteristic  of 
software,  since  such  characteristics  are  determined 
by  a  large  number  of  factors,  many  of  which  are 
unknowable,  unmeasurable,  or  uncontrollable.  Em¬ 
pirical:  For  similar  reasons,  empirical  studies  have 
failed  to  yield  definitive  results  regarding  the  valid¬ 
ity  of  v(G)  as  a  software  metric,  as  pointed  out  by 
the  author.  A  fundamental  problem,  noted  in  the 
article,  is  the  lack  of  any  explicit  underlying  model, 
without  which  attempts  at  empirical  validation  are 
meaningless. 

Stetter84 

Stetter,  F.  “A  Measure  of  Program  Complexity.” 

Computer  Languages  9,  3-4  (1984),  203-208. 

Abstract:  The  author  proposes  a  measure  of  pro¬ 
gram  complexity  which  takes  into  account  both  the 
relationship  between  statements  and  the  relation¬ 
ships  between  statements  and  data  objects  (con¬ 
stants  and  variables ).  This  measure,  called  pro¬ 
gram  flow  complexity,  can  be  calculated  from  the 
source  text  of  a  program  in  an  easy  way. 

This  paper  is  a  review  of  McCabe’s  complexity 
measure  and  Myers’s  extension  to  it  The  author 
proposes  a  “cyclomatic  flow  complexity”  that  is 
claimed  to  eliminate  the  shortcomings  of  the  former 
metrics. 


Symons88 

Symons,  Charles  R.  “Function  Point  Analysis:  Dif¬ 
ficulties  and  Improvements.”  IEEE  Trans.  Software 
Eng.  14,  1  (Jan.  1988),  2-11. 

Abstract:  The  method  of  Function  Point  Analysis 
was  developed  by  Allan  Albrecht  to  help  measure 
the  size  of  a  computerized  business  information  sys¬ 
tem.  Such  sizes  are  needed  as  a  component  of  the 
measurement  of  productivity  in  system  development 
and  maintenance  activities,  and  as  a  component  of 
estimating  the  effort  needed  for  such  activities. 
Close  examination  of  the  method  shows  certain 
weaknesses,  and  the  author  proposes  a  partial  al¬ 
ternative.  The  paper  describes  the  principles  of  this 
"Mark  II"  approach,  the  results  of  some  measure¬ 
ments  of  actual  systems  to  calibrate  the  Mark  II 
approach,  and  conclusions  on  the  validity  and  ap¬ 
plicability  of  function  point  analysis  generally. 

Symons  presents  a  critical  review  of  Albrecht’s  FP 
methodology,  pointing  out  several  perceived 
shortcomings.  He  concludes  that  the  method  was 
developed  in  a  particular  environment  and  is  un¬ 
likely  to  be  valid  for  more  general  applications.  He 
then  proceeds  to  develop  an  alternative  formulation, 
for  what  he  calls  “Mark  II”  Function  Points. 
Whereas  Albrecht’s  FP  formula  involves  inputs, 
outputs,  internal  files,  external  files,  and  external 
inquiries,  the  author’s  new  formula  involves  only 
inputs,  outputs,  and  entities.  In  addition,  the  author 
introduces  six  new  factors  into  the  computation  of 
the  TCF  (Technical  Complexity  Factor),  thus  rais¬ 
ing  the  total  number  of  such  factors  from  14  to  20. 
Although  the  author  may  have  provided  additional 
rationale  for  the  new  formulation,  the  net  result 
seems  to  be  a  relatively  minor  modification  of  the 
original  FP  formulas.  Furthermore,  the  new  Mark  II 
FP  formujas  suffer  from  the  same  type  of  counting 
difficulties  and  lack  of  universality  for  which  the 
original  formulas  were  criticized. 

TauswortheSi 

Tausworthe,  R.  C.  Deep  Space  Network  Software 
Cost  Estimation  Model.  TR  #81-7,  Jet  Propulsion 
Lab,  Pasadena,  Calif.,  1981. 

Abstract:  This  report  presents  a  parametric  soft¬ 
ware  cost  estimation  model  prepared  for  JPL  Deep 
Space  Network  (DSN)  Data  Systems  implementation 
tasks.  The  resource  estimation  model  modifies  and 
combines  a  number  of  existing  models,  such  as 
those  of  the  General  Research  Corp.,  Doty  Associ¬ 
ates,  IBM  (Walston-Felix),  Rome  Air  Development 
Center,  University  of  Maryland,  and  Rayleigh- 
Norden-Putnam.  The  model  calibrates  the  task 
magnitude  and  difficulty,  development  environment, 
and  software  technology  effects  through  prompted 
responses  to  a  set  of  approximately  50  questions. 
Parameters  in  the  model  are  adjusted  to  fit  JPL 
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software  life-cycle  statistics.  The  estimation  model 
output  scales  a  standard  DSN  Work  Breakdown 
Structure,  which  is  then  input  to  a  PERTICPM  sys¬ 
tem,  producting  a  detailed  schedule  and  resource 
budget  for  the  project  being  planned. 

The  above  abstract  is  quoted  from  DACS  Document 
#MBIB-1,  ‘The  DACS  Measurement  Annotated 
Bibliography,  A  Bibliography  of  Software  Measure¬ 
ment  Literature,”  May  1986. 

Thebaut84 

Thebaut,  S.  M.  and  V.  Y.  Shen.  “An  Analytic 

Resource  Model  For  Large-Scale  Software  Develop¬ 
ment.”  Information  Processing  and  Management  20, 

1-2  (1984),  293-315. 

Abstract:  Recent  work  conducted  by  members  of 
the  Purdue  software  metrics  research  group  has 
focused  on  the  complexity  associated  with  coor¬ 
dinating  the  activities  of  persons  involved  in  large- 
scale  programming  efforts.  A  resource  model  is 
presented  which  is  designed  to  reflect  the  impact  of 
this  complexity  on  the  economics  of  software  devel¬ 
opment.  The  model  is  based  on  a  formulation  in 
which  development  effort  is  functionally  related  to 
measures  of  product  size  and  manloading.  The  par¬ 
ticular  formulation  used  is  meant  to  suggest  a  logi¬ 
cal  decomposition  of  development  effort  into  com¬ 
ponents  related  to  the  independent  programming 
activity  of  individuals  and  to  the  overhead  associ¬ 
ated  with  the  required  information  flow  within  a 
programming  team.  The  model  is  evaluated  in  light 
of  acquired  data  reflecting  a  large  number  of  com¬ 
mercially  developed  software  products  from  two 
separate  sources.  Additional  sources  of  data  are 
actively  being  sought.  Although  strongly  analytic  in 
nature,  the  model's  performance  is,  for  the  avail¬ 
able  data,  at  least  as  good  in  accounting  for  the 
observed  variability  in  development  effort  as  some 
highly  publicized  empirically  based  models  for  com¬ 
parable  complexity.  It  is  argued,  however,  that  the 
model’s  principal  strength  lies  not  in  its  data  fitting 
ability,  but  rather  in  its  straight  forward  end  intui¬ 
tively  appealing  representation  of  relationships  in¬ 
volving  manpower,  time,  and  effort. 

The  cooperative  programming  model  (COPMO)  is 
proposed  in  this  article.  In  this  model,  the  equation 
for  total  effort  includes  two  terms,  one  correspond¬ 
ing  to  the  effort  expended  in  programming-related 
activities  by  individuals  and  the  other  corresponding 
to  the  effort  expended  in  coordinating  these  activi¬ 
ties  among  all  programming  team  members.  As 
noted  above,  attempts  to  validate  the  model  against 
empirical  data  indicate  that  the  model  compares 
favorably  with  other  models  of  comparable  com¬ 
plexity,  while  possessing  a  more  satisfying  intuitive 
basis.  NASA  and  Boehm’s  data  sets  were  used  to 
compare  the  model  with  the  COCOMO  and  Putnam 
models. 


Troy81 

Troy,  D.  A.  and  S.  H.  Zweben.  “Measuring  the 
Quiity  of  Structured  Designs.”  J.  Syst.  and  Soft¬ 
ware  2, 2  (June  1981),  1 13-120. 

Abstract:  Investigates  the  possibility  of  providing 
some  useful  measures  to  aid  in  the  evaluation  of 
software  designs.  Such  measurements  should  allow 
some  degree  of  predictability  in  estimating  the  qual¬ 
ity  of  a  coded  software  product  based  upon  its  de¬ 
sign  and  should  allow  identification  and  correction 
of  deficient  designs  prior  to  the  coding  phase,  thus 
providing  lower  software  development  costs.  The 
study  involves  the  identification  of  a  set  of  hypoth¬ 
esized  measures  of  design  quality  and  the  collection 
of  these  measures  from  a  set  of  designs  for  a  soft¬ 
ware  system  developed  in  industry.  In  addition,  the 
number  of  modifications  made  to  the  coded  soft¬ 
ware  that  resulted  from  these  designs  was  collected. 

A  data  analysis  was  performed  to  identify  relation¬ 
ships  between  the  measures  of  design  quality  and 
the  number  of  modifications  made  to  the  coded  pro¬ 
grams.  The  results  indicated  that  module  coupling 
was  an  important  factor  in  determining  the  quality 
of  the  resulting  product.  The  design  metrics  ac¬ 
counted  for  roughly  50-60  percent  of  the  variability 
in  the  modification  data,  which  supports  the  find¬ 
ings  of  previous  studies.  Finally,  the  weaknesses  of 
the  study  are  identified  and  proposed  improvements 
are  suggested. 

The  authors  attempt  to  correlate  software  design 
parameters,  as  taken  from  structure  charts,  with 
software  quality,  as  measured  by  defect  counts. 

Walston77 

Walston,  C.  E.  and  C.  P.  Felix.  “A  Method  of  Pro¬ 
gramming  Measurement  and  Estimation.”  IBM  Sys¬ 
tems  J.  16,  1  (1977),  54-73.  Reprinted  in  [Putnam- 
80],  238-257. 

Abstract:  Improvements  in  programming  technol¬ 
ogy  have  paralleled  improvements  in  computing 
system  architecture  and  materials.  Along  with  in¬ 
creasing  knowledge  of  the  system  and  program  de¬ 
velopment  processes,  there  has  been  some  notable 
research  into  programming  project  measurement, 
estimation,  and  planning.  Discussed  is  a  method  of 
programming  project  productivity  estimation.  Also 
presented  are  preliminary  results  of  research  into 
methods  of  measuring  and  estimating  programming 
project  duration,  staff  size  and  computer  cost. 

This  is  a  classic  paper  in  the  area  of  software  project 
measurement  and  estimation;  it  is  based  on  statis¬ 
tical  analyses  of  historical  software  data.  It  dis¬ 
cusses  the  software  measurements  program  initiated 
in  1972  in  the  IBM  Federal  Systems  Division  as  an 
attempt  to  assess  the  effects  of  structured  program¬ 
ming  on  the  software  development  process.  At  the 
time  the  paper  was  written,  the  software  database 
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contained  data  on  60  completed  projects  that  ranged 
from  4,000  to  467,000  LOC,  and  from  12  to  1 1,758 
person-months  of  effort.  The  projects  represented 
28  high-level  languages,  and  66  computer  systems, 
and  were  classified  as  small  less-complex,  medium 
less-complex,  medium  complex,  and  large  complex 
systems.  After  obtaining  a  basic  relationship  be¬ 
tween  LOC  and  total  effort,  68  variables  were  in¬ 
vestigated  for  their  effects  on  productivity.  Of 
these,  29  were  found  to  correlate  with  productivity 
changes;  they  were  then  used  to  compute  a  produc¬ 
tivity  index  for  a  given  project. 

Wolverton74 

Wolverton,  R.  W.  “The  Cost  of  Developing  Large- 

Scale  Software.”  IEEE  Trans.  Computers  C-23,  6 

(June  1974),  615-636.  Reprinted  in  [Putnam80], 
282-303. 

Abstract:  The  work  of  software  cost  forecasting 
falls  into  two  parts.  First  we  make  what  we  call 
structural  forecasts,  and  then  we  calculate  the  abso¬ 
lute  dollar-volume  forecasts.  Structural  forecasts 
describe  the  technology  and  function  of  a  software 
project,  but  not  its  size.  We  allocate  resources 
(costs)  over  the  project's  life  cycle  from  the  struc¬ 
tural  forecasts.  Judgement,  technical  knowledge, 
and  econometric  research  should  combine  in 
making  the  structural  forecasts.  A  methodology 
based  on  a  25  x  7  structural  forecast  matrix  that 
has  been  used  by  TRW  with  good  results  over  the 
past  few  years  is  presented  in  this  paper.  With  the 
structural  forecast  in  hand,  we  go  on  to  calculate 
the  absolute  dollar-volume  forecasts.  The  general 
logic  followed  in  "absolute"  cost  estimating  can  be 
based  on  either  a  mental  process  or  an  explicit  al¬ 
gorithm.  A  cost  estimating  algorithm  is  presented 
and  five  traditional  methods  of  software  cost 
forecasting  are  described:  top-down  estimating, 
similarities  and  differences  estimating,  ratio  es¬ 
timating,  standards  estimating,  bottom-up  estimat¬ 
ing.  All  forecasting  methods  suffer  from  the  need 
for  a  valid  cost  data  base  for  many  estimating  situa¬ 
tions.  Software  information  elements  that  experi¬ 
ence  has  shown  to  be  useful  in  establishing  such  a 
data  base  are  given  in  the  body  of  the  paper.  Major 
pricing  pitfalls  are  identified.  Two  case  studies  are 
presented  that  illustrate  the  software  cost  forecast¬ 
ing  methodology  and  historical  results.  Topics  for 
further  work  and  study  are  suggested. 

This  is  a  classic  paper,  for  Wolverton’s  model  is 
one  of  the  best-known  cost  estimation  models  de¬ 
veloped  in  the  early  1970s.  The  method  is  based 
upon  using  historical  data  from  previous  projects. 
Estimating  the  cost  for  a  software  module  consists 
of  three  steps:  first,  estimating  the  type  of  software 
module;  second,  estimating  the  difficulty  (complex¬ 
ity)  based  upon  a  six-point  scale;  and  third,  estimat¬ 
ing  the  size  (LOC)  of  the  module.  Once  these  three 


factors  have  been  estimated,  the  cost  of  the  module 
can  be  computed  from  historical  cost  data  for 
similar  projects.  The  cost  of  the  software  system  is 
then  simply  the  sum  of  the  costs  for  all  modules. 
Like  most  such  models,  it  may  work  well  in  the 
environment  for  which  it  was  developed  but  cannot 
be  used  in  other  environments  without  caution  and, 
probably,  recalibration  to  that  environment. 

Woodfield81 

Woodfield,  S.  N„  V.  Y.  Shen,  and  H.  E.  Dunsmore. 

“A  Study  of  Several  Metrics  for  Programming 

Effort.”  J.  Syst.  and  Software  2,  2  (June  1981), 

97-103. 

Abstract:  As  the  cost  of  programming  becomes  a 
major  component  of  the  cost  of  computer  systems,  it 
becomes  imperative  that  program  development  and 
maintenance  be  better  managed.  One  measurement 
a  manager  could  use  is  programming  complexity. 
Such  a  measure  can  be  very  useful  if  the  manager  is 
confident  that  the  higher  the  complexity  measure  is 
for  a  programming  project,  the  more  effort  it  takes 
to  complete  the  project  and  perhaps  to  maintain  it. 
Until  recently  most  measures  of  complexity  were 
based  only  on  intuition  and  experience.  In  the  past 
3  years  two  objective  metrics  have  been  introduced, 
McCabe’s  cyclomatic  number  v(G)  and  Halsteads 
effort  measure  E.  This  paper  reports  an  empirical 
study  designed  to  compare  these  two  metrics  with  a 
classic  size  measure,  lines  of  code.  A  fourth  metric 
based  on  a  model  of  programming  is  introduced 
and  shown  to  be  better  than  the  previously  known 
metrics  for  some  experimental  data. 

Four  software  metrics — LOC,  McCabe’s  v(G), 
Halstead’s  E,  and  an  author-modified  E  metric — are 
compared  to  observed  program  development  times. 
The  authors  introduce  the  “Logical  Module  Hypoth¬ 
esis”  as  support  for  a  modification  of  the  E  metric. 

Woodward79 

Woodward,  M.  R.,  M.  A.  Hennell,  and  D.  Hedley. 

“A  Measure  of  Control  Flow  Complexity  in  Program 

Text.”  IEEE  Trans.  Software  Eng.  SE-5,  1  (Jan. 
1979),  45-50. 

Abstract:  This  paper  discusses  the  need  for  meas¬ 
ures  of  complexity  and  unstructuredness  of  pro¬ 
grams.  A  simple  language  independent  concept  is 
put  forward  as  a  measure  of  control  flow  complexity 
in  program  text  and  is  then  developed  for  use  as  a 
measure  of  unstructuredness.  The  proposed  metric 
is  compared  with  other  metrics,  the  most  notable  of 
which  is  the  cyclomatic  complexity  measure  of 
McCabe.  Some  experience  with  automatic  tools  for 
obtaining  these  metrics  is  reported. 

The  concept  of  a  “knot”  as  a  measure  of  program 
complexity  is  introduced  and  compared  with 
McCabe’s  v(G). 
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Yau80 

•  Yau,  S.  S.  and  J.  S.  Collofeilo.  “Some  Stability 
Measures  for  Software  Maintenance.”  IEEE  Trans. 
Software  Eng.  SE-6,  6  (Nov.  1980),  545-552. 

Abstract:  Software  maintenance  is  the  dominant 
factor  contributing  to  the  high  cost  of  software.  In 
this  paper,  the  software  maintenance  process  and 
the  important  software  quality  attributes  that  affect 
the  maintenance  effort  are  discussed.  One  of  the 
most  important  quality  attributes  of  software  main¬ 
tainability  is  the  stability  of  a  program,  which  in¬ 
dicates  the  resistance  to  the  potential  ripple  effect 
that  the  program  would  have  when  it  is  modified. 
Measures  for  estimating  the  stability  of  a  program 
and  the  modules  of  which  the  program  is  composed 
are  presented,  and  an  algorithm  for  computing 
these  stability  measures  is  given.  An  algorithm  for 
normalizing  these  measures  is  also  given.  Applica¬ 
tions  of  these  measures  during  the  maintenance 
phase  are  discussed  along  with  an  example.  An 
indirect  validation  of  these  stability  measures  is 
also  given.  Future  research  efforts  involving  ap¬ 
plications  of  these  measures  during  the  design 
phase,  program  restructuring  based  on  these  meas¬ 
ures,  and  the  development  of  an  overall  maintain¬ 
ability  measure  are  also  discussed. 

Yau85 

Yau,  S.  S.  and  J.  S.  Collofeilo.  “Design  Stability 
Measures  For  Software  Maintenance."  IEEE  Trans. 
Software  Eng.  SE-1I,  9  (Sept.  1985),  849-856. 

Abstract:  The  high  cost  of  software  during  its  life 
cycle  can  be  attributed  largely  to  software  mainte¬ 
nance  activities,  and  a  major  portion  of  these  activi¬ 
ties  is  to  deal  with  the  modifications  of  the  software. 

In  this  paper,  design  stability  measures  which  in¬ 
dicate  the  potential  ripple  effect  characteristics  due 
to  modifications  of  the  program  at  design  level  are 
presented.  These  measures  can  be  generated  at  any 
point  in  the  design  phase  of  the  software  life  cycle 
which  enables  early  maintainability  feedback  to  the 
software  developers.  The  validation  of  these  meas¬ 
ures  and  future  research  efforts  involving  the  devel¬ 
opment  of  a  user-oriented  maintainability  measure, 
which  incorporates  the  design  stability  measures  as 
well  as  other  design  measures,  are  discussed. 

The  approach  taken  is  based  upon  the  data  abstrac¬ 
tion  and  information  hiding  principles  discussed  by 
D.  L.  Pamas.  Thus,  the  metrics  defined  assume  a 
modular  program  structure  and  should  be  applicable 
to  software  designs  employing  modem  program¬ 
ming  practices.  A  design  stability  measure  (DS)  is 
computed  for  each  module,  and  these  values  are 
then  used  to  compute  a  program  design  stability 
measure  (PDS)  for  the  whole  program.  The  design 
stability  measures  are  based  upon  the  assumptions 
buried  in  the  module  designs,  and  the  potential  rip¬ 


ple  effects  upon  other  modules  if  a  module  is  modi¬ 
fied. 
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